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ABSTRACT 


This  report  develops  improveinents  to  a new  project  scheduling  procedure 
Statistical  PERT,  being  developed  at  the  Ins^titute  of  Statistics,  Texas  A&M 
University.  The  project  scheduling  algorithm  is  a five  step  iterative 
procedure  capable  of  determining  a minimum  cost  project  schedule  when 
the  activities  making  up  the  project  have  durations  which  are  random 
variables.  The  cost  of  an  activity  is  assumed  to  be  a convex  piecewise 
linear  function  of  the  activity’s  mean  duration.  The  problem  is  to 
determine  the  activity  mean  durations  which  both  minimize  the  total 
project  cost  and  insure  that  the  mean  (or  some  specified  percentile) 
of  the  corresponding  project  completion  time  distribution  is  less  than 
or  equal  to  a specified  project  deadline.  The  entire  distribution  of 
the  project’s  completion  time  under  the  minimum  cost  schedule  is  a 
valuable  by-product. 

A critical  step,  Subnetwork  Analysis,  in  the  proposed  procedure 
.is  improved  and  extended.  Subnetwork  Analysis  determines  an  estimate 
of  the  duration  distribution,  F(t) , for  each  subnetwork  identified  in 
the  previous  steps.  This  estimate  is  extended  to  include  an  extra- 
polation of  upper  and  lower  bounds  on  F(t) . This  report  also  develops  a 
new  sampling  procedure  which  results  in  improved  estimators  for  the 
bounds  on  F(t). 
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1.  A STATISTICAL  APPROACH  TO  PROJECT  SCHEDULING 

1 . 1 Introduction 

The  many  technological  advances  of  the  last  century  have 
resulted  in  a drastic  increase  in  the  magnitude  and  complexity  of 
man’s  enterprises.  This,  in  turn,  has  brought  about  an  acute  need 
for  detailed  and  effective  project  planning.  Thus,  in  recent  years, 
a search  for  a general  technique  which  can  be  employed  to  simplify 
the  task  of  cost-effective  project  scheduling  has  been  undertaken. 

A host  of  promising  strategies  have  been  proposed,  and  a few  have 
even  enjoyed  widespread  use.  However,  the  methods  currently  in  use 
have  possibly  serious  shortcomings  (see,  for  example,  Sielken  and 
Hartley  (1977)).  Therefore,  under  the  sponsorship  of  the  Office  of 
Naval  Research,  the  Institute  of  Statistics  has  undertaken  the  develop- 
ment, implementation,  and  evaluation  of  a new  project  scheduling  system 
that  yields  reliable  results  and  can  be  economically  applied  to  very 
large  scheduling  problems.  This  report  is  a part  of  that 
undertaking. 

1.2  The  Project  Scheduling  Problem 

Project  scheduling  problems  arise  in  a wide  variety  of  contexts. 
Consequently,  a number  of  varying  formulations  of  the  problem  are 
currently  in  use.  Since  these  formulations  are  not  all  exactly  equiva- 
lent, this  subsection  gives  the  specific  formulation  considered  in 


this  work. 
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A project  is,  in  general,  made  up  of  a series  of  "tasks"  or 
activities"  which  consume  time.  These  activities  are  represented 
graphically  by  directed  arcs.  The  origin  point  and  terminal  point 
of  an  arc  are  both  called  "nodes".  The  graphical  representation  of  a 
project,  showing  the  precedence  relationships  among  the  various  activi- 
ties, is  called  a "network";  the  first  node  in  a network  is  usually 
referred  to  as  the  "source"  while  the  last  node  is  usually  called  the 
"sink".  In  addition,  the  following  basic  rules  are  adhered  to: 

1)  Before  a particular  activity  may  begin,  every  other  activity 
whose  terminal  node  is  that  activity's  origin  node  must  be 
completed . 

2)  Arcs  imply  logical  precedence  only;  the  length  of  the  arc 
has  no  significance. 

3)  The  network  cannot  contain  any  loops  or  cycles. 

For  example,  a small  project  might  consist  of  activities  A,  B,  C,  D, 
and  E with  the  following  precedence  relationships: 

i)  A must  be  completed  before  either  C or  D can  be  started; 
li)  B must  be  completed  before  D can  be  started;  and 
iii)  C and  D must  both  be  completed  before  E can  be  started. 

The  corresponding  network  representation  is  shown  in  Figure  1.  The 
arc  labeled  F does  not  correspond  to  any  "real"  activity  but  is  a 
"dummy"  activity  merely  representing  the  precedence  relation  that  A 
must  be  completed  before  D can  be  started.  The  circles  numbered 
1,  2,  ...,  5 represent  the  activities'  origin  and  terminal  nodes. 
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Figure  1 

A small  project  represented  as  a directed  network. 

The  time  required  to  complete  an  activity  is  a random  variable. 

The  cost  of  an  activity  is  a convex  piecewise  linear  function  of  the 
activity’s  mean  duration.  Thus,  a ’’project  schedule”  is  a specifica- 
tion of  each  activity’s  mean  duration.  The  ’’total  project  cost”  is 
simply  the  sum  of  the  corresponding  activity  costs.  The  time  to  com- 
plete the  entire  project  is  a random  variable  whose  distribution 
depends  upon  the  activity  duration  distributions.  The  objective  is 
to  determine  a minimum  cost  project  schedule  such  that  the  mean  (or 
some  percentile)  of  the  corresponding  project  completion  time  distribu- 
tion is  less  than  or  equal  to  a specified  project  deadline. 
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1.3  Outline  of  the  New  Approach  to 
Project  Scheduling 

In  1974  the  development'  of  a new  approach  to  project  scheduling 
was  begun  with  the  support  of  the  Office  of  Naval  Research.  The  new 
project  scheduling  procedure  that  has  resulted  is  an  iterative- 
algorithm  involving  the  following  five  general  steps: 

Step  1,  Deterministic  Scheduling:  Find  a minimum  cost 
project  schedule  which  completes  the  project  by 
TARGET  TIME  when  each  activity’s  duration  is 
exactly  its  mean  duration  and  hence  deterministic 
instead  of  random.  (The  initial  value  of  TARGET 
TIME  is  usually  the  specified  project  deadline.) 

Step  2.  Simplification:  Let  each  activity’s  duration  be 
a random  variable  with  distribution  corresponding 
to  that  activity’s  mean  duration  chosen  during 
Deterministic  Scheduling.  Replace  various  config- 
urations of  activities  by  single  activities.  The 
duration  distribution  for  a replacement  activity 
is  the  distribution  of  the  time  to  complete  all  of 
the  activities  in  the  configuration  it  is  replacing. 
The  result  of  this  step  is  a simplified  project  net- 
work with  fewer  activities. 

Step  3.  Decomposition:  Partition  the  simplified  project 

network  into  several  subnetworks  in  such  a way  that 
the  resultant  subnetworks  can  be  linked  together  in 


either  series  or  parallel  to  form  the  simplified 
project  network. 


5 


Step  4.  Subnetwork  Analysis;  Analyze  separately  each  of  the 
subnetworks  determined  during  Decomposition.  Within 
a subnetwork  each  activity’s  duration  distribution 
is  approximated  by  a two-point  discrete  distribution 
with  matching  mean,  variance,  and  third  moment. 

Determine  the  subnetwork  duration  distribution 
corresponding  to  these  discrete  activity  duration 
dis  tributions . 

Step  5.  Synthesis:  Combine  the  approximate  subnetwork  duration 
distributions  to  obtain  an  approximate  completion  time 
distribution  for  the  entire  project.  If  the  mean 
(or  some  specified  percentile),  T,  of  this  project 
completion  time  distribution  is  sufficiently  close 
to  the  specified  project  deadline,  the  "optimal**  project 
schedule  has  been  found.  Otherwise,  reset  TARGET  TIME  to 
New  TARGET  TIME  = Old  TARGET  TIME* (Pro j ect  Deadline/T) 
and  return  to  Step  1. 

A general  discussion  and  relatively  nonmathematical  overview  of  this 
project  scheduling  procedure  is  contained  in  Baker  and  Sielken  (1978) (see  also 
Sielken  and  Hartley  (1977)).  The  detailed  documentation  of  the  development 
thus  far  of  each  step  is  as  follows: 

Step  1.  Dunn  and  Sielken  (1977); 

Step  2.  Hartley  and  Wortham  (1966)  and  Ringer  (1969); 

Step  3.  Sielken  and  Fisher  (1976); 
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Step  4.  Sielken,  Ringer,  Hartley,  and  Arseven  (1974)  and  Sielken, 
Hartley,  and  Spoeri  (1976); 

Step  5.  Sielken,  Ringer,  Hartley,  and  Arseven  (1974)  and  Sielken, 
Hartley,  and  Spoeri  (1976). 

1.4  An  Integrated  System  of  Computer  Programs 

Prior  to  this  research,  separate  computer  programs  had  been 
written  to  perform  each  of  the  five  steps.  (These  programs  are  fully 
documented  in  the  references  cited  for  each  step.)  However,  from  a 
user's  viewpoint,  this  arrangement  was  awkward  because  the  programs 
had  to  be  executed  one  at  a time  and  the  output  from  each  step  had  to 
be  manually  modified  for  use  by  the  next  program  in  the  sequence. 

Thus,  one  of  the  objectives  of  this  research  has  been  to  fashion  the 
individual  programs  into  an  integrated  package  that  is  more  practicable 
from  a user's  viewpoint. 

The  ability  to  schedule  large  projects  with  as  many  as  1000 
activities  in  the  project  network  and  as  many  as  500  activities  in  any 
simplified  subnetwork  was  one  of  the  desired  characteristics  of  the 
computer  implementation.  This  objective  prohibited  the  combination 
of  the  five  original  programs  into  a single  large  program  since  doing 
so  would  drastically  limit  the  size  of  the  project  that  could  be 
analyzed  because  of  the  computer  core  storage  restrictions.  Thus,  the 
computer  implementation  of  the  new  project  scheduling  procedure  has 
been  in  the  form  of  several  individual  programs  internally  linked 
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together • The  resulting  integrated  system  of  computer  programs 
requires  that  the  user  supply  the  project  description  and  algorithm 
parameters  only  to  the  main  (first)  program.  From  then  on  each 
program  automatically  prepares  the  proper  input  for  the  remaining 
programs  in  the  iterative  procedure  and  stores  this  information  on 
either  disk  or  tape  from  which  it  is  retrieved  as  needed.  The  job 
control  language  automatically  calls  the  individual  programs  as  it 
cycles  through  the  five  step  iterative  algorithm. 

Since  the  new  project  scheduling  procedure  is  an  iterative 
procedure,  it  may  repeat  Steps  1-5.  However,  as  pointed  out  in 
Sielken  and  Hartley  (1977),  after  the  initial  performance  of  Steps 
1-3  their  subsequent  performance  is  greatly  simplified.  Thus,  special 
simplified  versions  of  the  programs  for  these  steps  are  called  when 
these  steps  are  repeated.  Needless  to  say,  the  preparation  of  these 
simplified  versions  has  greatly  improved  the  efficiency  of  the  computer 
implementation . 

The  new  project  scheduling  software  package  that  has  resulted  is 
fully  documented  in  the  User's  Guide  found  in  Baker  and  Sielken  (1978). 
Included  in  Baker  and  Sielken  (1978)  is  an  example  with  a complete 
listing  of  the  system's  input  and  output. 

1.5  The  Determination  of  the  Subnetwork 
Duration  Distribution 

Two  new  theoretical  contributions  to  the  project  scheduling 
procedure  are  documented  in  this  report.  Both  are  improvements 
to  the  statistical  methodologies  used  in  determining  the  subnetwork 


8 


duration  distributions.  Section  2 of  this  report  contains  a 
detailed  description  of  the  procedure  (including  the  improvements) 
used  to  determine  the  subnetwork  duration  distributions.  Sections  3 
and  4 present  detailed  documentations  of  the  improvements . 

From  a statistical  viewpoint  a subnetwork’s  duration  is  defined 
easily  enough  as  the  maximum  of  the  paths  through  the  subnetwork. 

In  Figure  1 (p.  3),  for  example,  there  are  really  three  paths; 
namely, 

= A + C + E 
P2=A+F+D+E 

P3  = B + D + E . (1.1) 

The  project  duration  is  simply  the  maximum  of  P^,  P2,  and  P^.  However, 
the  difficulty  is  that  the  paths  are  usually  dependent  since  the  paths 
often  have  activities  in  common.  For  example,  the  paths  P^  and  P2 
have  activities  A and  E in  common.  Section  5 contains  a review  of  the 
few  known  general  results  concerning  the  distribution  of  the  maximum 
of  dependent  random  variables.  Also  in  Section  5 is  an  indication  of 
how  these  general  results  could  be  used  to  modify  the  Subnetwork 
Analysis  procedure  described  in  Section  2, 
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2.  ANALYSIS  OF  A SUBNETWORK 


2.1  Introduction 


The  objective  of  Subnetwork  Analysis  is  to  determine  each 
subnetwork’s  duration  distribution. 

At  the  end  of  Step  2 each  activity  in  the  subnetwork  has  a 
specified  duration  distribution.  This  distribution  is  now  approximated 
by  a two-point  discrete  distribution.  In  particular,  an  activity,  say 
A,  is  now  conceptualized  as  having  two  possible  duration  times,  say 

A 

for  a lower  duration  and  u^  for  an  upper  duration.  The  probability 
that  the  activity  duration  is  is  assumed  to  be  P^,  and  correspond- 
ingly the  probability  that  the  activity  duration  is  u is  assumed  to 

A 

be  ^ values  of  u^,  and  are  chosen  so  that  the  mean, 

variance,  and  third  moment  of  the  discrete  distribution  are  the  same 
as  the  mean,  variance,  and  third  moment  of  activity  A’s  specified 
duration  distribution. 

Let  n be  the  number  of  activities  in  the  subnetwork.  Let 
V = 1,  2,  ...,  2^  index  the  2^  possible  configurations  of  activity 
durations  when  each  activity  is  either  at  its  upper  duration  or  at  its 
lower  duration.  Let 


p^  = probability  of  the  v-th  activity 
duration  configuration 
n 


= n 

i=l 


[P.(l  “ 


.)  + Q.6 

.1  1 V, 


(2.1) 


where 
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= 1 if  the  duration  for  the  i-th  activity  is 

in  the  v-th  activity  duration  configuration 
= 0 if  the  duration  for  the  i-th  activity  is 

in  the  v-th  activity  duration  configuration.  (2.2) 

Then  the  subnetwork  duration  distribution  when  each  activity  has  its 
two-point  discrete  distribution  is 

2^ 

F(t)  = S (2.3) 


and 


= the  subnetwork  duration  when  the  activity  durations 
are  in  the  v-th  configuration 


(2.4) 


W - 1 


if  t < t 
V — 


= 0 if  t > t . 

V 


(2.5) 


The  discrete  distribution  function  F is  an  approximation  to  the  sub- 
network's exact  duration  distribution. 

The  goal  of  Subnetwork  Analysis  is  to  determine  F. 

Since  the  number,  n,  of  activities  in  the  subnetwork  may  be  fairly 
large,  the  complete  enumeration  of  the  2 discrete  subnetwork  durations 
may  sometimes  be  impractical.  When  this  happens,  the  discrete  sub- 
network duration  distribution  F must  be  approximated.  The  approximation 
of  F will  be  based  on  the  activities  which  are  mostly  likely  to  influ- 
ence the  subnetwork  duration.  The  identification  of  these  important 
activities  and  their  interrelationships  is  discussed  in  the  next  sub- 
section which  is  a review  of  the  procedures  originating  in  Sielken, 
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Ringer,  Hartley,  and  Arseven  (1974)  and  Sielken,  Hartley,  and  Spoeri 
(1976). 

Each  subnetwork  is  assumed  to  be  an  acyclic  network  with  one 
source,  one  sink,  and  no  cut  vertices. 


2.2  Formation  of  Clusters 


The  mean  duration  for  activity  A is 


“a  " ^A^A  ^a“a  ’ 

and  the  standard  deviation  of  activity  A's  duration  is 

"a  = f^A^A  + Va  - • 


(2.6) 


(2.7) 


When  each  activity  duration  takes  on  a fixed  (nonrandom)  value, 
the  subnetwork's  duration  is  the  duration  of  the  longest  path  through 
the  subnetwork  where  the  "length"  of  an  activity  is  its  duration. 

For  example,  consider  the  subnetwork  described  in  Table  1 and  displayed 
in  Figure  2.  When  each  activity  duration  is  its  mean  duration,  then 
the  subnetwork's  duration  is  32,  corresponding  to  the  path  consisting 
of  activities  2,  7,  and  9. 


Definition  1:  A critical  activity  is  an  activity  on  the  longest 
path  when  all  the  subnetwork's  activity  durations  are  set  to  their 
means . 


Thus  in  the  example  the  critical  activities  are  2,  7,  and  9. 

The  search  for  the  activities  which  are  most  likely  to  influence 
the  subnetwork  duration  begins  with  the  critical  activities.  Each 
critical  activity  initiates  a separate  set  of  activities  called  a 


ro  ir>  VO 


12 


TABLE  1 


Activity  Durations  for  the  Subnetwork  in  Figure  2 


Activity 


u. 


m. 


0.00 

2.00 

8.00 

10.50 

9.55 

15.67 

1.50 

4.00 

3.35 

5.52 

4.00 

6.00 

8.73 

16.90 

12.00 

14.50 

5.00 

15.00 

.5 

1 

1 

.2 

10 

1 

.6 

12 

3 

.8  • 

2 

1 

.7 

4 

1 

.5 

5 

1 

.6 

12 

4 

.2 

14 

1 

.5 

10 

5 

Figure  2 

Subnetwork  with  activities  labeled  (activity  number;  mean  duration) . 
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"cluster".  Initially  there  are  several  clusters.  In  the  example  the 
initial  clusters  are 

= {2}  , = {7}  , and  = {9}  . (2.8) 

Some  of  the  non-critical  activities  may  influence  the  subnetwork's 
duration  when  not  all  of  the  activity  durations  are  at  their  mean 
values . 

Definition  2;  An  associate  of  a critical  activity  A is  a non- 

critical  activity  which  is  on  the  longest  path  when  all  activity 

durations  are  set  to  their  mean  except  for  the  critical  activity  A 

which  has  its  duration  reduced  from  m.  to  max(m,  - As.,  0)  where  X 

A A A 

is  a nonnegative  parameter. 

Thus  the  associates  of  a critical  activity  A are  those  activities 
whose  effect  on  the  subnetwork’s  duration  are  related  to  activity  A’s 
duration.  In  the  example,  for  X = 1 the  associates  of  the  critical 
activities  2,  7,  and  9 can  be  determined  by  considering  Figures  3, 4, and 
5 respectively.  In  Figure  3 the  longest  path  is  still  the  critical  path 
2,  7,  and  9,  so  that  activity  2 has  no  associates.  In  Figure  4 the 
longest  path  is  2,  5,  6,  and  9,  so  that  activities  5 and  6 are  the 
associates  of  activity  7.  In  Figure  5,  the  longest  path  is  2,  5,  and 
8,  so  that  activities  5 and  8 are  associates  of  activity  9. 

The  associates  of  each  critical  activity  are  determined  and  added 
to  the  cluster  containing  that  critical  activity.  Thus,  in  the 
example  the  clusters  are  expanded  to 


{2},  ^2=  {7,  5,  6},  and  = {9,  5,  8}  . 


(2.9) 
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Figure  3 


Subnetwork  for  determining  the  associates  of  Activity  2 when  A = 1. 


Figure  4 

Subnetwork  for  determining  the  associates  of  Activity  7 when  A = 1. 
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Figure  5 

Subnetwork  for  determining  the  associates  of  Activity  9 when  X = 1. 


Figure  6 

Subnetwork  for  determining  the  eliminants  of  Activity  3 when  0=2. 
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The  idea  underlying  the  clusters  is  that  they  should  be  sets  of 
activities  whose  effects  on  the  subnetwork’s  duration  are  interrelated. 
Thus,  if  two  clusters  contain  any  activities  in  common,  the  activities 
in  these  two  clusters  all  have  an  interrelated  effect  on  the  subnet^ 
work’s  duration,  so  the  two  clusters  are  combined  into  one  cluster. 

In  the  example  clusters  and  both  contain  activity  5,  so  they 
are  combined.  The  resulting  clusters  are 

= {2}  and  = {5,  6,  7,  8,  9}  . (2.10) 

A non-critical  activity  may  also  influence  the  subnetwork’s 
duration  if  its  duration  exceeds  its  mean. 

Definition  3:  An  eliminant  of  a non-critical  activity  A is  a 
critical  activity  which  is  not  on  the  longest  path  when  all  activity 
durations  are  set  to  their  means  except  for  activity  A which  has  its 
duration  increased  from  m^  to  m^  + 0s^  where  0 is  a nonnegative 
parameter . 

For  instance,  if  0 = 2,  the  eliminants  of  the  non-critical  activity  3 
in  the  example  can  be  determined  from  Figure  6.  There  the  longest 
path  is  1,  3,  6,  and  9,  so  that  the  eliminants  of  activity  3 are  the 
critical  activities  2 and  7.  In  the  example,  when  0=2,  none  of  the 
other  non-critical  activities  (1,  4,  5,  6,  and  8)  have  any  eliminants. 

For  a specified  value  of  0 the  eliminants  of  every  non-critical 
activity  are  determined.  If  a non-critical  activity  A has  eliminants, 
then  the  effect  of  A’s  eliminants  on  the  subnetwork  duration  is  related 
to  A’s  duration,  so  A is  added  to  every  cluster  containing  at  least  one 
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of  its  eliminants.  Thus  in  the  example  the  clusters  become 

= {2,  3}  and  = {3,  5,  6,  7,  8,  9}  . (2.11) 

After  the  clusters  have  been  expanded  on  the  basis  of  eliminants, 
any  two  clusters  containing  common  elements  are  combined.  Therefore 
in  the  example,  and  are  combined  to  form  a single  cluster 

- {2,  3,  5,  6,  7,  8,  9}  . (2.12) 

In  general,  after  the  determination  of  associates  and  eliminants 
for  specified  values  of  X and  0 and  the  subsequent  combining  of 
clusters,  there  may  still  be  more  than  one  cluster  and  some  of  the  non- 
critical  activities  may  not  be  in  any  cluster.  Usually  the  larger 
the  values  of  X and  0 the  greater  the  number  of  activities  in  the 
clusters  and  the  smaller  the  number  of  clusters.  The  clusters  that 
remain  represent  sets  of  activities  such  that  the  effects  on  the  sub- 
network’s duration  of  the  activity  durations  for  the  activities  within 
a set  are  all  interrelated.  Activities  in  different  clusters  have 
roughly  independent  effects  on  the  subnetwork’s  duration.  Activities 
not  in  any  cluster  have  essentially  no  effect  on  the  subnetwork’s 
duration. 

The  consideration  of  critical  activities,  associates,  eliminants, 
and  the  formation  of  clusters  of  related  activities  is  obviously  only 
one  way  of  identifying  the  activities  which  have  an  important  effect 
on  the  subnetwork’s  duration  and  their  interrelationships.  However, 
this  particular  procedure  does  have  the  following  desirable  properties: 


18 


Property  1: 


Property  2; 


Property  3; 


If  > X^,  then  any  activity  which  would  be  an  associate 
of  a critical  activity  A when  X = X^  would  also  be  an 
associate  of  A when  X = X^. 

If  ^2  ^ ^1’  critical  activity  which  would  be  an 

eliminant  of  a non-critical  activity  A when  0=0^  would 
also  be  an  eliminant  of  A when  0 = 02- 

For  any  fixed  value  of  X,  the  set  of  activities  in  the 
union  of  the  clusters  is  monotically  nondecreasing  as 


0 oo. 


Property  4: 

Property  5: 


Property  6: 


The  number  of  clusters  is  nonincreasing  as  0 

If  s^  > 0 for  a non-critical  activity  A,  then  there 

exists  0*  < “ such  that  A will  have  some  eliminants  for 
A 

any  9^9^- 

If  s^  > 0 for  every  non-critical  activity  A and 


0*  = max{0.;  A non-critical}  , 
A 


then  for  9^9*  all  activities  will  be  in  one  cluster. 
Most  of  these  properties  are  fairly  straightforward;  however, 
Property  6 requires  some  special  justification.  This  justification 
is  based  on  the  following  definition  and  theorem  which  is  proven  in 
Sielken,  Ringer,  Hartley,  and  Arseven  (1974). 


Definition  4:  In  any  acyclic  network  a bridge  over  any  two 
consecutive  arcs  A^  and  is  any  arc  A^  such  that  all  paths  from  the 

source  to  the  sink  passing  through  A^  do  not  pass  through  either  A^ 
or  k^. 


19 


Theorem  1;  In  any  acyclic  network  with  no  cut  vertices  there  is 
at  least  one  bridge  for  any  pair  of  consecutive  arcs. 

Property  5 implies  that  all  activities  will  belong  to  some  cluster  if 
0^0*.  Now  consider  any  two  consecutive  activities  and  on  the 
critical  path.  Theorem  1 implies  that  there  is  a bridge  over  A^  and 
A^,  say  A^.  Since  the  critical  path  passes  through  A^  and  A^ 
cannot  be  on  the  critical  path.  Therefore,  if  0 ^ 0*  ^ 0 , A-  and  A^ 

A3  1 I 

will  be  eliminants  of  A3  and  hence  will  be  in  the  same  cluster  as  A3. 
Thus,  since  each  cluster  contains  at  least  one  original  critical 
activity  and  any  two  consecutive  critical  path  activities  belong  to 
the  same  cluster  when  0 0*,  there  is  only  one  cluster  when  0^0* 

and  Property  6 is  established. 

2.3  Bounding  the  Discrete  Subnetwork  Duration  Distribution  F 
2.3.1  Upper  Bounds  on  F 

Suppose  that  the  cluster  formation  procedure  described  in  sub- 
section 2.2  has  been  carried  out  on  a subnetwork  for  some  specified 
values  of  0 and  X and  yielded  K clusters.  For  each  cluster  C so 

determined,  let  n^  be  the  number  of  activities  in  the  cluster  and 
^c  ^c 

let  V = 1,  ...,  2 index  the  2 configurations  of  activity  durations 
corresponding  to 

(a)  the  duration  for  each  activity  A not  in  C being  equal  to 
its  lower  point  and 

(b)  the  durations  for  the  activities  in  C being  at  each  of  the 

^c 

2 possible  combinations  of  their  upper  and  lower  points. 


Then  define 
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f'^CC;  t)  = E P^I^(t^)  (2.13) 

V=1 

where  p^,  t^,  and  are  defined  in  (2.1),  (2.4),  and  (2.5) 

respectively.  The  distribution  function  f'^(C;  t)  is  an  upper  bound 
on  F.  This  can  be  shown  by  considering  the  following: 

Theorem  2;  For  any  cluster  C,  any  t,  and  any  activity  A not 

in  C, 

f'^(C  U {a};  t)  £ f'^(C;  t)  . 

(For  the  proof  of  this  theorem,  see  Sielken,  Hartley,  and  Spoeri 
(1975).)  A straightforward  application  of  Theorem  2 yields 

Theorem  3:  For  any  two  clusters  and  and  any  t, 

f'^(C^  U C^;  t)  £ min{F'^(C^;  t) , f'^(C2;  t) } . 

If  C*  represents  the  set  (cluster)  of  all  activities  in  the  subnetwork, 
then 


F(t)  = f'^(C*;  t)  . (2.14) 

Since  C is  a subset  of  C*,  either  Theorem  2 or  Theorem  3 implies 

f'^(C;  t)  £ F(t)  (2.15) 

for  any  cluster  C. 

Theorems  2 and  3 can  also  be  used  to  define  some  tighter  upper 
bounds  on  the  subnetwork's  duration  distribution  than  f'^(C;  t) . 
Historically,  two  different  improved  bounds  have  been  employed,  and 
both  have  been  incorporated  into  the  current  subnetwork  analysis 
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procedure.  They  are 


and 


F|(t;  0,  X)  = 


+ . K 

F ( U C ; t) 
i=l 


(2.16) 


F^(t;  0,  X)  = min  f‘''(C  ; t)  . (2.17) 
l<ij<K 

Let  F'^(t;  0,  X)  denote  either  F^(t;  0,  X)  or  F2(t;  0,  X).  Then, 
since  Property  2 of  the  cluster  formation  procedure  implies  that  as  0 
increases  the  clusters  expand  or  are  combined.  Theorems  2 and  3 imply 
that  F"^(t;  6,  is  a nonincreasing  function  of  0 for  every  t and  any 
X.  Property  6 and  (2.14)  imply  that  for  9^9* 


F'^(t;  0,  X)  = F(t)  (2.18) 

for  every  t and  any  X.  Also  (2.14)  along  with  the  definitions  (2.16) 
and  (2.17)  imply 

F‘^(t;  0,  X)  ^ F(t)  (2.19) 


for  all  t,  0,  and  X.  These  results  are  summarized  in  Theorem  4. 

Theorem  4 : (a)  F"^(t;  0,  X)  is  a nonincreasing  function  of  0 

for  every  t and  any  X; 

(b)  there  exists  a finite  value  0*  such  that  9^9* 
implies  F"^(t;  0,  X)  = F(t)  for  every  t and  X; 
and 

(c)  for  any  0,  X,  and  t 


F‘^(t;  0,  X)  > F(t)  . 
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2.3.2  Lower  Bounds  on  F 

Let  n^  denote  the  number  of  activities  in  cluster  C,  and  let 

n n 

c c 

V = 1,  ...,  2 index  the  2 configuration  of  activity  durations 
corresponding  to 

(a)  the  duration  for  each  activity  A not  in  the  cluster  being 
equal  to  its  upper  point  u , and 

(b)  the  durations  for  activities  in  the  cluster  being  at  each 

n 

Q 

of  the  2 possible  combinations  of  the  upper  and  lower 
points . 

Then  define  n 

2 ^ 

F (C;  t)  = E P^I,.(t^)  (2.20) 

v=l 

where  p^,  t^,  and  are  as  previously  defined.  Also  define 

_ K 

F (t;  0,  X)  = F ( U C ; t)  (2.21) 

i=l 

and 

F^Ct;  e,  X)  = max  f'(C  ; t)  . (2.22) 

l<i<K  ^ 

Using  an  argument  completely  analagous  to  that  used  to  prove 
Theorem  4,  Sielken,  Hartley,  and  Spoeri  (1975)  also  proved 

Theorem  5:  (a)  F (t;  0,  X)  is  a nondecreasing  function  of  0 

for  any  fixed  value  of  X; 

(b)  there  exists  a finite  value  0*  such  that  0 
implies 

F"(t;  0,  X)  = F(t) 
for  every  t and  any  X;  and 
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(c)  for  any  9,  X,  and  t 

F’(t;  0,  X)  < F(t)  . 

(Again,  F (t;  9,  X)  is  a generic  term  used  to  denote  either  F^(t;  0,  X) 
or  F^Ct;  9,  X).)  Thus,  F (t;  9,  X)  is  a valid  lower  bound  on  F. 

2.3.3  The  Tightness  of  the  Bounds  on  F 

That  the  F^-bounds  are  tighter  than  the  F^-bounds  can  be  seen  as 

follows.  The  evaluation  of  F^(t;  0,  X)  involves  the  determination  of 

^ K 

F (C.;  t)  for  each  i whereas  F- (t;  9,  X)  = F ( U C.,  t) . Let  L.  be 

i’  1 ’ ’ . - 1 1 

1=1 

the  length  of  the  longest  path  when 

1)  the  activities  in  are  at  a particular  configuration  of  their 
upper  and  lower  durations  and 

2)  all  activities  not  in  have  their  upper  durations. 

Let  Ly  be  the  length  of  the  longest  path  when 

1)  the  configuration  of  upper  and  lower  durations  for  the  activi- 
ties in  C , is  the  same  as  in  the  determination  of  L., 

^ K ^ 

2)  the  activities  in  U C.  - C.  are  at  any  combination  of  their 

j=i ' " 

upper  and  lower  durations,  and 

K 

3)  all  activities  not  in  U C.  have  their  upper  durations. 

j=i  ^ 

Then  ^ since  every  activity  duration  in  the  determination  of  L^ 
is  greater  than  or  equal  to  its  corresponding  duration  in  the  determina- 
tion of  Ly.  Since  L^  ^ Ly  for  any  configuration  of  upper  and  lower 
durations  for  the  activities  in  C^, 

K 

f"(  U C.;  t)  ^ F^(C.;  t)  (2.23) 

j=i  ^ 


and 
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K 


X)  . (2.24) 


(t;  0,  X)  = F ( U C.;  t)  ^ max  F (C . ; t)  = F-(t;  0, 
j=l  ^ l£i£K  ^ 

A similar  argument  can  be  used  to  show 

F^(t;  0,  X)  = F ( U C ; t)  1 min  F (C  ; t)  = F^(t;  0,  X)  . (2.25) 

j=l  ^ 

The  extent  of  the  differences  between  the  two  upper  bounds  and  two 
lower  bounds  depends  heavily  on  the  structure  of  the  particular  sub- 
network being  analyzed  and  is  a topic  that  should  be  considered  in 
future  empirical  studies. 


2.4  Using  Sampling  to  Estimate  the  Upper  and 
Lower  Bounds  on  F 


The  only  instance  in  which  upper  and  lower  bounds  on  F are 
computed  rather  than  F itself  is  when  it  is  computationally  imprac- 
tical to  determine  the  longest  path  for  each  of  the  2^  activity 
duration  configurations. 

For  given  0 and  X,  the  evaluation  of  F^(t;  0,  X)  only  requires 
the  determination  of  the  longest  path  for  each  of  2 activity  con- 
figurations where  ny  is  the  number  of  activities  in  the  union  of  the 


K 


clusters  C = U C.;  i.e.. 


K 

ny  = n . 

j=l  ^ 


(2.26) 


The  evaluation  of  F^(t;  0,  X)  also  entails  only  2 longest  path 
determinations.  Likewise,  the  evaluation  of  F2(t;  0,  X)  or  F~(t;  0,  X) 
only  requires  the  determination  of  the  longest  path  for  each  of 


K n, 
n = E 2 ^ 
^ i=l 


(2.27) 


n 


activity  configurations.  Since  2 is  always  greater  than  or  equal  to 
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most  economical  bounds  to 

compute  in  terms  of  the  number  of  longest  path  determinations  required. 
However,  for  any  given  0 andX,F^(t;  0,  X)  and  F^(t;  0,  X)  are  tighter 
bounds  than  ^2^^;  0,  X)  and  ^^(t;  0,  X),  respectively.  Thus,  in  making 
the  choice  of  which  one  of  the  two  sets  of  bounds  to  compute,  there  is 
a trade-off  between  the  accuracy  of  the  bounds  and  the  effort  required 
to  compute  them. 

Since  the  cluster  formation  procedure  is  such  that  the  clusters 

expand  or  are  pooled  as  0 increases,  it  may  happen  that  for  particular 
ny  n^ 

0 and  X,  2 and  2 for  some  i are  both  quite  large  even  though  0 is 
only  moderately  large.  In  this  case  it  again  becomes  impractical  to 
examine  all  the  required  activity  configurations  involved  in  determin- 
ing either  the  F^-bounds  or  the  F^-bounds.  Consequently,  if  for  the 

ny  n^ 

specified  values  of  0 and  X,  2 (or  2 for  some  i,  as  the  case  may 
be)  is  excessively  large.  Subnetwork  Analysis  will  compute  estimates  of 
the  corresponding  upper  and  lower  bounds  based  on  only  a sample  of  the 
total  number  of  possible  configurations.  The  actual  estimators  used 
in  this  situation  are  described  and  developed  in  Section  3. 

2.5  Estimating  F by  Extrapolating  Between 
the  Upper  and  Lower  Bounds  on  F 

Theorems  4 and  5 of  subsection  2.3  imply  that  if  ^ ^ 

^i+1  ^i  i •••>  I then 

F'^Ct;  e^,  X^)  > F^(t;  Q^,  X^)  > . . . > F^(t;  0^,  X^)  > F(t)  > 

F~(t;  8^,  X^)  ^ F“(t;  X^_^)  ^ ^ F“(t;  6^,  X^) 


(2.28) 
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for  all  t.  Thus,  if  F (t;  0,  X)  and  F (t;  0,  X)  have  been  calculated 
for  I pairs  (0^,  X^)  i = 1,  I ^ 0^,  X^_^^  ^ X^) , then  F(t) 

may  be  estimated  by  extrapolating  between  F'^(t;  0^,  X^)  and 
F (t;  0^,  X^).  As  currently  written,  Subnetwork  Analysis  calculates 
upper  and  lower  bounds  on  the  subnetwork's  approximate  duration 
distribution  for  a sequence  of  three  (0,  X)  pairs,  (0,  X)  = (1,  1), 

(2,  2),  (3,  2).  An  extrapolation  procedure  is  then  used  to  obtain  an 
estimate  of  F.  The  procedure  that  has  been  developed  for  this  purpose 
is  documented  in  Section  4 of  this  report. 

2.6  A Summary  of  the  Subnetwork  Analysis  Procedure 

The  following  is  a step-by-step  description  of  the  subnetwork 
analysis  procedure  in  summary  form.  Recall  that  the  objective  of 
Subnetwork  Analysis  is  to  determine  an  "approximation",  say  F,  to 
the  subnetwork's  duration  distribution. 

(a)  If  n = 1,  let  F be  the  actual  activity  duration  distribution 

for  the  one  activity  comprising  the  subnetwork,  and  stop. 

Otherwise,  go  to  Step  b. 

(b)  Identify  the  two-point  discrete  distribution  (£,,  u.,  P , 0 ) 

A A A A 

for  every  activity  A in  the  subnetwork. 

(c)  Ascertain  the  user's  choice  of 

(1)  NMAX,  the  maximum  value  of  m for  which  all  2™  activity 
duration  configurations  are  to  be  explicitly  considered, 

(2)  the  (0,  X)  pairs  to  be  considered  if  not  the  standard 
pairs  (1,  1),  (2,  2),  and  (3,  2), 
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(3)  whether  the  bounds  on  F are  to  be  (F^,  F^)  or  (F^,  F^) 
if  n > NMAX,  and 

(4)  SAMSIZ,  the  sample  size  to  be  taken  if,  in  the  determina- 
tion of  bounds  on  F for  some  (0,  X)  pair,  the  number  of 

activity  configurations  in  the  cluster  being  considered 

, ^NMAX. 
exceeds  2 

(d)  If  the  number  of  activities  in  the  subnetwork  doesn’t  exceed 
NMAX,  compute  the  subnetwork’s  discrete  duration  distribution, 
F,  explicitly,  let  F = F,  and  stop.  Otherwise,  go  to  Step  e. 

(e)  Do  Steps  f - i for  every  (6,  X)  pair.  Then  go  to  Step  j. 

(f)  Form  the  clusters  corresponding  to  (0,  X).  If  the  bounds  are 


+ 


to  be  ^I^’  S-  the  bounds  are  to 


be 


(F^,  F^) , go  to  Step  h. 

(g)  Form  the  union  of  the  clusters  and  determine  ny.  If 


ny  _<  NMAX,  evaluate  the  bounds  on  the  basis  of  all 

nil 

2 ^ activity  duration  configurations.  If  ny  > NMAX,  take  a 
sample  of  size  SAMSIZ  from  the  2 activity  duration  config- 
urations and  form  both  F^  and  F^  on  the  basis  of  this  single 
sample.  Go  to  Step  e. 

(h)  Do  the  following  for  each  cluster,  C^.  Let  n^  denote  the 

number  of  activities  in  the  cluster.  If  n.  < NMAX,  evaluate 

1 — 

F (C^;  t)  and  F (C^;  t)  on  the  basis  of  all  2 ^ activity 

duration  configurations.  If  n^  > NMAX,  take  a sample  of  size 

n . 

SAMSIZ  from  the  2 activity  duration  configurations  and  form 
both  F (C^;  t)  and  f'*'(C^;  t)  on  the  basis  of  this  single 
sample. 
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(i)  Form  and  F^  from  the  F"(C^;  t)'s  and  f‘^(C^;  t)'s 
respectively.  Go  to  Step  e. 

(j)  Form  F by  extrapolating  the  (f",  f"^)  bounds  determined  for 
the  (6,  X)  pairs.  Stop. 

This  process  is  repeated  for  every  subnetwork  in  the  simplified 
project  network. 


/ 
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3.  SAMPLE-BASED  ESTIMATORS  FOR  A DISCRETE 
DISTRIBUTION  FUNCTION 

3 . 1 Introduction 

In  the  subnetwork  analysis  procedure  the  calculation  of  F (C;  t) 

or  F^(C;  t) , say  F(C;  t),  for  a cluster  C comprised  of  n activities 

n n 

c c 

requires  2 longest  path  determinations.  If  2 is  too  large  from  a 

practical  standpoint,  F(C;  t)  must  be  approximated  on  the  basis  of  a 

^c 

sample  of  the  possible  2 activity  duration  configurations. 

The  estimation  of  F(C:  t)  involves  two  aspects 

1)  the  identification  of  an  acceptable  method  of  sample  selec- 
tion, and 

2)  the  determination  of  the  form  of  the  estimator. 

Because  of  the  practical  difficulties  (computer  storage  requirements, 
etc.)  involved  in  implementing  other  sampling  schemes,  only  simple 
random  sampling  (with  replacement)  and  systematic  sampling  were  con- 
sidered in  this  research.  Of  these  two,  systematic  sampling  is  the 
preferred  technique.  The  reasons  for  this  preference  are  presented 
in  subsection  3.4. 

Now  F(C;  t)  is  the  distribution  function  of  a discrete  random 
variable  X (the  length  of  the  subnetwork's  longest  path)  which  has  a 
known  number, 

n 

M = 2 ^ (3.1) 

of  possible  values  which  are  not  necessarily  distinct.  Since  the 
probability  that  a particular  activity.  A,  attains  its  lower  duration 
is  known  to  be  and  the  probability  that  it  attains  its  upper 
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duration  is  known  to  be  Q^,  the  random  variable  X is  such  that  when  an 
activity  duration  configuration  is  realized  not  only  is  the  numerical 
value  of  X observed  but  also  the  probability,  p,  of  that  activity 
duration  configuration  is  available.  This  departure  from  the  usual 
estimation  situation  has  been  exploited  in  the  formation  of  the 
estimators  for  F(C;  t) . 


3.2  Some  Proposed  Estimators 


The  five  estimators  of  F(C;  t)  that  were  considered  in  this 


research  are 


m 


G^(t)  = 


(3.2) 


m 


m 


G2(t)  = 


(3.3) 


m 


0 , t < 


G^it)  = ^ 


i=l 


m 


Xt  < t < X 
1 — 1 


m 


(3.4) 


m 


1 , t > X ; 
— m 
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G^(t) 


0 , t < X, 


m 

1=1 


m 


ZZ5«5IZI 


’ ^1  1 t < Xi^  ; 


m 


1 , t > X 


(3.5) 


m 


and 


G3(t)  = 


0 , t < X- 


m 


E p.If(x,)+ -, ir 

1-1  ^ ^ 


> ’'l  i ^ ’'m  ■ 


1 , t > X 

t » — 1 


m 


(3.6) 


where  in  each  case,  the  represent  an  ordered  sample  of  size  m from 

the  population  of  subnetwork  durations  corresponding  to  the  M activity 

duration  configurations,  p^  is  the  probability  of  the  activity  duration 

configuration  corresponding  to  x^,  j is  the  largest  integer  such  that 

X.  < t and  I^(*)  is  as  defined  in  (2.5). 

J t 

Even  though  sampling  will  normally  only  be  employed  if  M is 

very  large,  for  illustration  purposes  consider  M = 20  and  that  the 

following  ordered  sample  of  size  m = 5 has  been  obtained: 

X = 2,  X = 2.5,  x^  = 4.0,  X = 4.5,  x = 5 
^ ^ ^ ^ (3.7) 

= .02,  = .03,  p^  = -05,  p^  = .07,  P3  = .03  . 

The  G^(t) , G^Ct),  ...,  G^(t)  for  this  sample  data  are  displayed  in 

Figures  7a  - 7e,  respectively. 
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Figure  7 

G^(t) , G2(t),  G^Ct),  G^(t),  and  G^(t)  for  the  sample  data  in  (3.7). 
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The  discussion  in  subsections  3.2.1  - 3.2.5  assumes  that  the  ra 
sample  points  have  been  selected  without  replacement. 

3.2.1  The  Empirical  Distribution  Function,  G^(t) 

In  many  situations  requiring  the  estimation  of  a distribution 
function,  the  empirical  distribution  function,  G^(t),  has  very 
desirable  properties.  However,  these  properties  are  derived  from 
the  assumption  that  the  sample  values  have  been  observed  with  roughly 
the  same  relative  frequency  as  they  occur  in  the  population.  For  the 
situation  under  consideration,  though,  every  occurs  in  the  sample 
with  relative  frequency  ^ regardless  of  the  true  value  of  p^. 
Consequently,  G^(t)  was  of  interest  in  this  study  only  as  a basis 
for  comparison  and  generalization. 

3.2.2  The  Modified  Empirical  Distribution  Function,  G^Ct) 

The  major  disadvantage  of  G^(t)  is  that  it  ignores  the  informa- 
tion contained  in  the  p^’s.  Instead  of  assigning  weight  — to 
each  sampled  point,  G^Ct)  assigns  to  the  weight 

m 

P./  S P.  - (3.8) 

i=l 

3.2.3  The  Continuous  Estimator,  G^Ct) 

Although  F is  discrete,  the  subnetwork’s  actual  duration 
distribution  is  continuous.  Therefore,  it  was  anticipated  that  a 
continuous  estimator  might  be  in  order.  The  estimator  G^Ct)  is 
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continuous  and  equals  G^Ct)  at  every  sampled  point  but  interpolates 
linearly  between  sample  points. 

3.2.4  The  Mixed  Estimator,  G^(t) 

The  estimator  G^(t)  has  the  advantages  of  a continuous  estimator 
between  sampled  values  but  preserves  the  discrete  nature  at  the 
sampled  values.  Like  G^Ct),  G^(t)  also  equals  G^Ct)  at  every  sampled 
point.  However,  at  each  sampled  point  x^,  G^(t)  has  a jump  of  size 
p^.  For  t between  x^  and  ^4^^)  interpolates  linearly  between 

G^Cx^  and  - p.^^. 

3.2.5  The  Mixed  Estimator,'  ^5(t) 

Like  G^(t),  the  estimator  also  assigns  the  discrete  jump 

sizes  to  the  sampled  points.  However,  this  estimator  spreads  the 
probability  that  is  unaccounted  for  by  the  sampled  values  evenly  over 
the  range  x to  x . 

3.3  Criteria  for  a Good  Estimator 

The  quantity  being  estimated  is  a distribution  function.  Thus, 
a "good"  estimator,  say  G(t),  should  have  the  properties  of  a 
distribution  function;  namely, 

(1)  0 _<  G(t)  ^1,  -00  < t < “ and 

(2)  G(t^)  iG(t^)  for  < t^  . 

In  addition,  it  is  desirable  (but  not  requisite)  that  the  estimator 
be  "consistent"  in  the  sense  that  the  estimate  of  the  distribution 
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function  based  on  a sample  containing  all  possible  is  the  true 
distribution  function  of  X. 

All  of  the  estimators  exhibit  the  properties  of  a distribution 
function  if  the  sampling  is  performed  without  replacement.  However, 
the  estimators  G^(t)  and  G^(t)  may  not  be  between  zero  and  one  for 
all  t if  sampling  is  with  replacement* 

Only  estimators  G^Ct),  G^(t),  and  G^(t)  are  always  consistent. 

This  follows  immediately  since  if  m = M and  the  sample  corresponds 
to  all  M activity  duration  configurations,  then 

m 

^ P.  = 1 . (3.9) 

i=l 

Furthermore  only  G^Ct)  satisfies 

lim  G(t)  = F(t)  (3.10) 

m-H» 

if  the  sampling  is  done  with  replacement. 

Since  the  mean  and  upper  percentiles  of  the  subnetwork’s 
duration  distribution  are  the  quantities  of  primary  interest,  this 
work  also  sought  an  estimator  whose  estimates  of  a distribution 
function’s  mean,  y,  90-th  percentile,  Pggj  95th  percentile, 

exhibit  a high  degree  of  precision.  The  simulation  study  described 
in  subsection  3.5  was  designed  to  determine  the  suitability  of  the 
proposed  estimators  in  this  regard. 
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3.4  Choosing  Between  Simple  Random  or  Systematic  Sampling 

On  the  basis  of  computational  ease  alone,  sampling  with  replace- 
ment is  the  preferred  method.  However,  only  G^Ct)  satisfies  (3.10) 
if  the  sampling  is  done  with  replacement.  Also  G^(t)  and  G^(t)  are 
not  necessarily  distribution  functions  if  the  sampling  is  done  with 
replacement.  On  the  other  hand,  if  systematic  sampling  is  employed 
these  difficulties  do  not  arise.  In  addition,  the  simulation  study 
described  in  subsection  3.5  indicates  that  estimates  derived 
from  systematic  samples  contain  more  information  than  esti- 
mates based  on  simple  random  sampling.  This  was  anticipated 

since  Cochran  (1946)  showed  that  for  at  least  partially  ordered 

_ m 

populations  the  variance  of  x = E x./m  under  systematic  sampling  is 

1=1  ^ 

always  less  than  it  is  under  simple  random  sampling.  Hence,  the 

algorithm  does  its  sampling  via  the  systematic  technique  if  possible. 

Unfortunately,  the  way  a computer  represents  integers  in  its  memory 

ct— 1 

makes  systematic  sampling  impractical  ofM>M  =2  -1  where  a 

o 

is  the  number  of  binary  bits  in  an  integer  word  for  the  particular 

machine  being  used.  For  most  modern  IBM  computers,  a = 32,  and  thus 

M = 2,147,483,647.  When  M > M , the  algorithm  uses  random 
o o 

sampling  with  replacement . 

3.4.1  An  Ordering  Scheme 

Unfortunately,  the  relative  magnitude  of  the  subnetwork  duration 
corresponding  to  a particular  activity  duration  configuration  cannot 
be  determined  in  general  unless  all  configurations  are  considered 
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explicitly.  Hence,  the  following  approximate  ordering  scheme  was 
devised. 

n 

c 

Let  V = 1,  . . . , n^  index  the  2 configurations  of  activity 
durations  corresponding  to 

(1)  the  duration  of  each  activity  not  in  the  cluster  being 
equal  to  either  its  upper  duration  or  its  lower  duration 
depending  on  whether  a lower  bound  or  an  upper  bound, 
respectively,  is  being  determined  and 

(2)  the  durations  for  the  activities  in  the  cluster  being  at 

n 

each  of  the  2 possible  combinations  of  their  upper  and 
lower  points. 

The  activity  duration  configuration  whose  corresponding  subnetwork 

duration,  say  x^,  is  approximately  the  v-th  smallest  subnetwork 

duration  can  be  determined  from  g (v)  defined  by 

c 


g (v)  = the  k~th  smallest  binary  integer  containing 
c 

exactly  i "1"  s 


where  i is  the  smallest  integer  such  that 


V < E 


n 


j=i  '•  j J 


and 


with 


i-1 

k = V - E 


n 


n 


j=0  l J 


(3.11) 


(3.12) 


j J 


(3.13) 


and 
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n 


1 


-1 

E 

3=0 V 3 J 


0 . 


(3.14) 


In  particular  the  activity  duration  configuration  corresponding  to 

approximately  the  v-th  smallest  subnetwork  duration  has  the  j-th 

activity  in  the  cluster  equal  to  its  lower  duration  if  the  j-th  digit 

(counting  from  the  least  significant  digit)  in  g (v)  is  0 and  equal 

c 

to  its  upper  duration  if  the  J-th  digit  in  g (v)  is  1. 

c 

For  V = 1,  g^  (v)  = O2  (base  2),  so  under  the  approximate  ordering 
c 

x^  equals  the  subnetwork’s  duration  when  every  activity  has  its  lower 

duration  which  in  fact  is  the  smallest  possible  x-value.  Similarly, 

X ^ is  the  subnetwork’s  duration  when  every  activity  has  its  upper 
^ n 

Q 

duration  and  is  the  largest  possible  x-value.  For  1 v^  < v^  < 2 , 

X is  not  necessarily  less  than  or  equal  to  x . However,  for  v 
s t ^ 

very  much  smaller  than  v^  the  activity  configuration  corresponding  to 

(v^)  has  more  activities  at  their  upper  duration  than  the  one 
c 

corresponding  to  g^  • Hence,  x is  likely  to  be  larger  than  x 

c t s 

For  example.  Table  3 gives  the  approximate  ordering  of  the  x-values 

for  the  small  subnetwork  pictured  in  Figure  8 and  described  in  Table  2 

for  the  case  when  all  five  of  the  subnetwork’s  activiites  are  in  the 


cluster  C. 
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Figure  8 

A small  subnetwork. 


TABLE  2 


The  Activity  Durations  for  the  Subnetwork  in  Figure  8 


Activity 

i 

u 

1 

5 

7 

2 

8 

10 

3 

3 

6 

4 

5 

6 

5 

4 

8 

I 
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TABLE  3 


The  Approximate  Ordering  of 

the  x-values  for 

the  Subnetwork  in 

Figure  8 

V 

83  (v) 

Activity  durations  corresponding  to  g^(v) 

5 4 3 2 1 

X 

V 

1 

00000 

4 

5 

3 

8 

5 

12 

2 

00001^ 

4 

5 

3 

8 

7 

14 

3 

00010^ 

4 

5 

3 

10 

5 

14 

4 

00100^ 

4 

5 

6 

8 

5 

15 

5 

01000^ 

4 

6 

3 

8 

5 

12 

6 

10000^ 

8 

5 

3 

8 

5 

16 

7 

00011^ 

4 

5 

3 

10 

7 

14 

8 

00101^ 

4 

5 

6 

8 

7 

17 

9 

00110^ 

4 

5 

6 

10 

5 

15 

10 

01001^ 

4 

6 

3 

8 

7 

14 

11 

01010^ 

4 

6 

3 

10 

5 

14 

12 

01100^ 

4 

6 

6 

8 

5 

15 

13 

10001^ 

8 

5 

3 

8 

7 

18 

14 

10010^ 

8 

5 

3 

10 

5 

18 

15 

10100^ 

8 

5 

6 

8 

5 

19 

16 

11000^ 

8 

6 

3 

8 

5 

16 

17 

00111^ 

4 

5 

6 

10 

7 

17 

18 

oloin 

4 

6 

3 

10 

7 

14 

19 

01101, 

4 

6 

6 

8 

7 

17 

20 

OHIO, 

4 

6 

6 

10 

5 

15 

21 

10011, 

8 

5 

3 

10 

7 

18 

22 

10101, 

8 

5 

6 

8 

7 

21 

23 

10110, 

8 

5 

6 

10 

5 

19 

24 

11001, 

8 

6 

3 

8 

7 

18 

25 

11010, 

8 

6 

3 

10 

5 

18 

26 

11100, 

8 

6 

6 

8 

5 

19 

27 

01111, 

4 

6 

6 

10 

7 

17 

28 

10111, 

8 

5 

6 

10 

7 

21 

29 

11011, 

8 

6 

3 

10 

7 

18 

30 

11101, 

8 

6 

6 

8 

7 

21 

31 

11110, 

8 

6 

6 

10 

5 

19 

32 

111112 

8 

6 

6 

10 

7 

21 

3.4.2  Implementing  the  Ordering  Scheme 


An  efficient  algorithm  for  finding  the  k-th  smallest  binary 
integer  containing  exactly  i ”1” s was  developed.  The  algorithm  is 
as  follows: 

1.  Let 


2. 


3. 


NP  = the  number  of  binary  digits  whose  values  are  as  yet 
undetermined , 

NI  = the  number  of  the  NP  remaining  digits  that  are  to  be 
assigned  the  value  ”1",  and 

J = the  location  of  the  digit  whose  value  is  currently 
being  determined  as  counted  from  the  right. 

( 'V 

n 


Set  NP  = n , NI  = i,  J = n , B 
c ’ c 

If  NI  < 1,  assign  the  value  "0" 

and  stop.  Otherwise,  set 


R = B - k . 


to  all  remaining  digits 


B = B X NI/NP 
NI  = NI  - 1 

NP  = NP  - 1 

RR  = R - B. 


If  RR  _<  0,  go  to  4.  Otherwise,  go  to  5 . 

4.  Assign  the  J-th  right-most  digit  the  value  "1".  Set 
J = J - 1.  Go  to  3. 

5.  Assign  the  J-th  right-most  digit  the  value  ”0".  Set 
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J = J - 1 

R = RR 


B = B (NP  - NI)/NP 
NP  = NP  - 1 


RR  = R - B . 


If  RR  ^ 0,  go  to  4 . Otherwise,  do  5 again. 

3.5  The  Simulation  Study 

Since  a subnetwork’s  duration  distribution  is  the  distribution 
of  the  maximum  path  length,  most  subnetwork  duration  distributions 
are  skewed  left.  Nevertheless,  the  behavior  of  the  proposed  estimators 
G^(t),  ...,  G^(t)  was  determined  for  samples  drawn  from  populations 
exhibiting  a variety  of  distributional  shapes.  Since  the  beta  dis- 
tribution with  probability  density  function  (p.d.f .) 


= 


r(a+3) 

r(a)r(3) 


t“-\l  - 


t) 


3-1 


a,  3 > 0 , 
0 < t < 1 


(3.15) 


is  a finite  range  distribution  which  can  assume  a wide  variety  of 
shapes,  the  subnetwork  duration  distributions  corresponded  to 


®2,23^^^ 


in  the  simulation  study.  (These  represent  the  shapes  highly  skewed 
left,  skewed  left,  symmetrical,  skewed  right,  and  highly  skewed 
right,  respectively,  as  indicated  in  Figure  9.) 
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Duration  distributions  used  in  the  simulation  study. 
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The  discrete  subnetwork  duration  distribution,  say  F corre- 

a,  p 

spending  to  ^ was  constructed  by  randomly  selecting 


^1  — ^2  — ' " — ^999  — ^1000  ’ 


and  defining 


t < t. 


t. 


t.  < t < t_, 
1 — 1+1 


= 1 , 


t > 1.0 


(3.16) 


Therefore  in  the  simulation  study  M = 1000.  Also,  since  (x)dx 

0 ot,  p 

does  not  exist  in  closed  form  unless  t = 1,  it  was  evaluated  at  each 

t^  < 1 using  an  approximation  due  to  Peizer  and  Pratt  (1963). 

Samples  of  size  m were  then  taken  from  each  of  the  F ^(t)  and 

a, 3 

the  estimators  G^(t),  were  calculated.  In  the  subnetwork 

analysis  procedure,  sampling  is  only  employed  when  practical  con- 
siderations dictate  the  explicit  consideration  of  only  a relatively 
small  proportion  of  the  subnetwork's  activity  duration  configurations. 
Hence,  the  only  sample  sizes  considered  by  this  work  were  m = 10,  20, 
50,  and  100  which  represent  sampling  proportions  of  1%,  2%,  5%,  and 
10% , respectively . 

The  performance  of  the  five  proposed  estimators  with  respect  to 

estimation  of  the  parameters  y,  and  was  evaluated  for  each 

of  the  F o(t)  on  the  basis  of  the  following  three  criteria 
a,  p 
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(i)  mean  deviation, 


MD(y)  = 

i=l 


200 


(3.17) 


(ii)  mean  square  error, 


MSE(y)  = E — 200 — ’ 


200  (y^-y)2 


(3.18) 


(iii)  bias. 


i=l 


(3.19) 


where  in  each  case 

Y ~ tbe  parameter  being  estimated  (y  = y,  ^90^’ 

Y^  = an  estimate  of  y based  on  the  i-th  sample  of  size  m drawn; 
and 

200  = the  number  of  samples  of  size  m drawn. 

3.5.1  A Comparison  of  G^(t),  ...,  G^(t)  under  Systematic  Sampling 

Tables  4-8  indicate  the  simulation  results  for  the  proposed 
estimators  under  systematic  sampling.  The  following  observations 
may  be  made: 

(1)  For  every  case  considered,  G^(t)  and  G^(t)  yielded  highly 
biased  estimates  of  at  least  one  of  the  3 parameters  y, 

Pg^,  and  PgQ  while  G2(t),  G2(t),  and  G^(t)  are  only 


moderately  biased. 


TABLE  4 


Simulation  Results  for  the  Highly  Skewed  Left  F .(t)  Having  y = 92122,  = 98763,  P = 97805* 

Zo  y Z 95  90 


Percent 

Sampling 

Estimator 

tt  > 

TT  > 

MSE(y) 

BIAS(y) 

> 

VO 

Ln 

> 

VO 

Ln 

S 

CO 

M 

> 

VO 

Ln 

td 

M 

> 

CO 

> 

VO 

Ln 

VO 

0 

> 

VO 

0 

w 

> 

VO 

0 

w 

M 

> 

C/3 

VO 

0 

^1 

50149 

41972 

17700 

-41972 

100000 

1237 

15 

-1237 

100000 

2195 

48 

-2195 

91275 

2592 

99 

-846 

97518 

2586 

126 

-1246 

95593 

3538 

200 

-2212 

1% 

""a 

85677 

6454 

508 

“6445 

95508 

3471 

201 

-3255 

93806 

4241 

269 

-3999 

85724 

6408 

503 

-6398 

95575 

3406 

196 

-3188 

93888 

4166 

263 

-3917 

50482 

41640 

17340 

-41640 

95200 

3562 

128 

-3563 

90523 

7282 

536 

-7282 

49986 

42135 

17774 

-42135 

100000 

1237 

15 

-1237 

97642 

1343 

27 

-164 

91480 

1971 

60 

-641 

97424 

1801 

56 

-1339 

96218 

2189 

90 

-1587 

2% 

^3 

88844 

3320 

158 

-3278 

96381 

2437 

85 

-2382 

94870 

2987 

134 

-2935 

G4 

88894 

3286 

155 

-3227 

96470 

2350 

81 

-2293 

94958 

2903 

130 

-2847 

^5 

50928 

41193 

16972 

-41193 

95430 

3334 

114 

-3334 

91132 

6674 

458 

-6674 

^1 

49884 

42238 

17843 

-42238 

97469 

1294 

21 

-1294 

92400 

5405 

296 

-5405 

^2 

91902 

1101 

19 

-220 

98225 

854 

14 

-539 

97481 

1182 

22 

-324 

5% 

G3 

90844 

1524 

33 

-1277 

97466 

1335 

29 

-1297 

96660 

1375 

30 

-1145 

90895 

1496 

33 

-1227 

97531 

1283 

28 

-1232 

96735 

1322 

28 

-1070 

52108 

40014 

16019 

-40014 

95897 

2867 

85 

-2867 

92388 

5418 

310 

-5418 

49851 

42271 

17869 

-42271 

97010 

1754 

33 

-1754 

90963 

6843 

469 

-6443 

^2 

91922 

904 

13 

-200 

98598 

561 

4 

-165 

97616 

764 

9 

-189 

10% 

""3 

91375 

1048 

17 

-746 

98107 

695 

8 

-656 

97226 

882 

11 

-579 

G4 

91429 

1030 

17 

-692 

98153 

657 

7 

-609 

97307 

822 

10 

-498 

54153 

37968 

14442 

-37968 

96490 

2274 

55 

-2274 

93792 

4012 

179 

-4012 

* All  entries  have  been  multiplied  by  10^. 


TABLE  5 


Simulation 

Results 

for  the 

Skewed  Left  F 

o,z 

(t)  Having 

y = 1 
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VO 
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> 
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> 

VO 

0 

VO 

0 
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w 

w 

VO 

0 

dd 

M 

P> 

cn 

VO 

0 

G, 

50149 

29978 
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-29978 

100000 

3992 

159 

-1992 

100000 

5900 

348 

-5900 

^2 

79420 

3674 

203 

-707 

92007 

5005 

384 

-4001 

90558 

5024 

416 

-3542 

1% 

73134 

7247 

711 

-6993 

89697 

6356 

563 

-6310 

87207 

6949 

666 

-6893 

73184 

7206 

704 

-6944 

89770 

6292 

554 

-6238 

87270 

6895 

658 

-6830 

G5 

50374 

29754 

8853 

-29754 

95034 

981 

10 

-973 

90127 

3973 

158 

-3973 

G. 

49986 

30141 

9105 

-30141 

100000 

3992 

159 

-3992 

97641 

3542 

152 

-3542 

^2 

79937 

2305 

101 

-191 

93913 

3182 

179 

-2095 

92167 

3204 

184 

-1933 

2% 

G^ 

77170 

3617 

190 

-2957 

92033 

4031 

255 

-3975 

90077 

4155 

280 

-4022 

77220 

3584 

187 

-2907 

92097 

3984 

251 

-3911 

90133 

4114 

276 

-3967 

S 

50679 

29449 

8672 

-29449 

95066 

959 

9 

-941 

90261 

3839 

148 

-3839 

G. 

49884 

30243 

9150 

-30243 

97469 

1467 

26 

1461 

92400 

1699 

32 

-1699 

^2 

79795 

1626 

43 

-332 

95200 

1762 

47 

-808 

93135 

1826 

51 

-964 

5% 

G^ 

78726 

2001 

62 

-1401 

94201 

2011 

63 

-1807 

92188 

2174 

68 

-1912 

G? 

78777 

1974 

61 

-1350 

94267 

1971 

61 

-1741 

92246 

2134 

66 

-1854 

D 

51543 

28584 

8172 

-29594 

05134 

900 

9 

-874 

90595 

3505 

125 

-3505 

G. 

49851 

30277 

9168 

-30277 

97010 

1002 

12 

1002 

90963 

3137 

99 

-3137 

^2 

79708 

1987 

53 

-420 

95166 

1140 

21 

-842 

93456 

1668 

41 

-644 

10% 

79167 

2157 

62 

-960 

94745 

1382 

29 

-1263 

92850 

1786 

45 

-1250 

G^ 

79220 

2143 

61 

-907 

94822 

1344 

28 

-1186 

92902 

1760 

44 

-1198 

53046 

27081 

7340 

-27081 

95224 

820 

8 

-783 

91076 

3023 

96 

-3033 

* All  entries  have  been  multiplied  by  10^. 


TABLE  6 


Simulation  Results  for  the  Symmetric  F (t)  Having  y = 50107,  P.  = 74892,  P = 69918* 
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s 
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71385 

5230 
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67119 

7398 

898 

-2799 

1% 

s 

43308 

6985 

658 

-6800 

67175 

8260 

1023 

-7718 

62090 

8498 

1113 

-7828 

^4 

43360 

6939 

651 

-6748 

67251 

8192 

1011 

-7642 

62160 

8437 

1100 

-7758 

50133 

34 

0 

25 

94973 

20080 

4032 

20080 

89947 

20029 

4012 

20029 

^1 

49986 

1224 

21 

-121 

100000 

25108 

6304 

25108 

97642 

27724 

7713 

27724 

49808 

3334 

164 

-299 

73208 

4345 

259 

-1684 

68825 

4068 

237 

-1093 

2% 

o3 

46864 

4075 

264 

-3243 

70871 

4926 

357 

-4022 

66021 

4974 

346 

-3897 

^4 

46918 

4050 

261 

-3189 

70938 

4876 

351 

-3955 

66085 

4934 

341 

-3833 

^5 

50130 

60 

0 

22 

94920 

20028 

4011 

20027 

89850 

19932 

3973 

19932 

^1 

49884 

514 

3 

-223 

97469 

22576 

5101 

22576 

92400 

22482 

5058 

22482 

o2 

49743 

2328 

76 

-364 

74020 

2527 

97 
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69763 

2683 

115 

-155 

5% 

s 

48610 

2530 

97 

-1497 

72953 

2923 

133 

-1939 

68610 

2665 

115 

-1308 

^4 

48665 

2515 

95 

-1442 

73019 

2873 

118 

-1873 

68673 

2632 

113 

-1245 

^5 

50122 

117 

0 

14 

94761 

19869 

3948 

19869 

89600 

19682 

3874 

19682 

^1 

49851 

307 

1 

257 

97010 

22117 

4894 

22117 

90963 

21044 

4430 

21044 

50026 

1251 

21 

-81 

74652 

1564 

41 

-240 

69884 

2006 

64 

-34 

10% 

o3 

49446 

1286 

25 

-661 

74138 

1782 

48 

-754 

69281 

1999 

63 

-637 

49503 

1278 

24 

-603 

74194 

1760 

47 

-699 

69343 

1975 

62 

-575 

^5 

50136 

123 

0 

28 

94570 

19677 

3872 

19677 

89288 

19370 

3753 

19370 

* 


All  entries  have  been  multiplied  by  10 
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TABLE  7 


Simulation  Results  for  the  Skewed  Right  g(i)  Having  y = 20067,  = 42929,  P^^  - 36912* 


w 

W 

w 

w 

Cfl 

w 

12: 

CO 

M 

s 

CO 

M 

0 (t) 

rt 

M 

o 

w 

> 

o 

w 

> 

^ f-i 

CA) 

> 

> 

w 

VO 

w 

M o 

TT  > S 

w 

CO 

KD 

> 

> 

o 

> 

(t) 

§ 

Ul 

VO 

VO 

VO 

VO 

> 

rJ  p 

rt 

n > 

r:  > 

Ti:  > 

Ul 

Ul 

VO 

o 

o 

VO 

OQ  rt 

O 

Ul 

o 

Gt 

50149 

30082 

9133 

3082 

100000 

57071 

32571 

57071 

100000 

63088 

39801 

63088 

20209 

3583 

181 

141 

41767 

5436 

490 

-1162 

35497 

6699 

718 

-1414 

1% 

15120 

5244 

389 

-4946 

37129 

6688 

671 

-5800 

30775 

7338 

871 

-6136 

15148 

5220 

385 

-4919 

37188 

6640 

663 

-5741 

30839 

7285 

861 

-6073 

s' 

49894 

29827 

8897 

29827 

94794 

52044 

27086 

52044 

89948 

53036 

28128 

53036 

G, 

49986 

29919 

8972 

29919 

100000 

57071 

32571 

57071 

97641 

60730 

36908 

60730 

G^ 

19867 

2283 

84 

-200 

41938 

3809 

221 

-991 

36214 

4003 

268 

-698 

2% 

A 

17288 

3233 

156 

-2779 

39630 

4479 

304 

-3299 

33979 

4576 

314 

-2933 

gI 

17332 

3200 

153 

-2735 

39683 

4446 

301 

-3247 

34041 

4534 

309 

-2870 

-5 

49562 

29495 

8700 

29495 

94918 

51988 

27028 

51988 

89841 

52929 

28015 

52929 

GI 

49884 

29817 

8894 

29817 

97469 

54540 

29750 

54540 

92400 

55489 

30793 

55489 

4 

20034 

1375 

32 

-33 

42323 

2114 

72 

-606 

36311 

2596 

102 

-600 

5% 

g2 

19008 

1793 

42 

-1059 

41319 

2390 

87 

-1610 

35474 

2739 

113 

-1437 

G^ 

19058 

1764 

41 

-1008 

41368 

2371 

84 

-1561 

35530 

2707 

111 

-1382 

^5 

48635 

28568 

8162 

28568 

94753 

51823 

26857 

51823 

89594 

52682 

27755 

52682 

G-. 

49851 

29784 

8871 

29784 

97010 

54080 

29249 

54080 

90963 

54051 

29216 

54051 

^2 

20093 

960 

13 

26 

42990 

1788 

48 

61 

36832 

1778 

48 

-380 

10% 

19563 

1062 

15 

-504 

42488 

1780 

47 

-441 

36122 

1902 

54 

-790 

G^ 

19615 

1040 

15 

-452 

42538 

1781 

47 

-391 

36183 

1867 

52 

-729 

^5 

47141 

27075 

7332 

27075 

94547 

51617 

26643 

51617 

89182 

52270 

27322 

52270 

* All  entries  have  been  multiplied  by  10^, 


TABLE  8 


Simulation  Results  for  the  Highly  Skewed  Right  F (t)  Having  n = 8072,  P = 18309,  P = 15444* 

yS-  Qfj 
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s 
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74 
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13125 

5164 
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-2319 

i% 

s 

3950 

4139 
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-4123 

12584 

5965 

519 

-5725 

9742 

5981 
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-5702 

3894 

4192 

197 

-4178 

12648 

5909 

511 

-5662 

9795 

5934 

493 

-5649 

S 

49724 

41652 

17350 

41652 

94966 

76657 

58762 

76657 

89932 

74488 

55484 

74488 

S 

49986 

41914 

17588 

41914 

100000 

81691 

66734 

81691 

97642 

82197 

67591 

82197 

7664 

1217 

23 

-408 

17050 

3421 

173 

-126 

14211 

2867 

126 

-1233 

2% 

^3 

5749 

2404 

73 

-2322 

14976 

3706 

206 

-2224 

12394 

3532 

173 

-3050 

"^4 

5757 

2394 

72 

-2315 

15037 

3656 
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-3272 

12453 

3489 
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49257 

41184 

16963 
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94908 
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76599 

89825 

74381 
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17486 

41812 

97469 
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s 
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-983 
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67 
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21 

-949 
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65 
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50 
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78700 
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75518 

""2 

7953 
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6 
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17991 
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18 
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15410 

1085 

16 

-34 

10% 

s 

7469 
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10 

-603 

17476 

1221 

22 

-833 

14854 

1059 

16 

-590 

7513 

738 

9 

-559 

17527 

1190 

21 

-782 

14896 
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16 
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s 

45895 

37823 

14311 

37823 

94548 

76238 

58123 

76238 

89198 

73754 

54398 

73754 

A 


All  entries  have  been  multiplied  by  10 
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(2)  In  the  vast  majority  of  the  cases  considered,  and  G^(t) 

performed  virtually  the  same  in  all  respects  while  G^Ct) 
performed  approximately  twice  as  well  as  either  G^(t)  or 
G^(t)  . 

(3)  The  relative  performances  of  G^(t) , G^(t)  remained 

virtually  unchanged  as  m increased - 

(4)  As  would  be  expected,  all  five  estimators  increased  in  pre- 
cision as  sample  size  increased - 

On  the  basis  of  these  observations,  G2(t)  appears  to  be  the  "best" 
estimator  and  is  the  one  implemented  by  the  subnetwork  analysis 
procedure. 

3.5.2  The  Performance  of  G^Ct)  under  Systematic  and  Ramdom  Sampling 

Table  9 presents  a comparison  of  the  performance  of  the  estimator 
G^Ct)  for  both  systematic  sampling  and  random  sampling  under  a variety 
of  sampling  conditions.  In  almost  every  case,  systematic  sampling 
was  superior  to  random  sampling  and  hence  is  the  preferred  technique. 


TABLE  9 
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.5646 
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1% 

^5,5 
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.4827 
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.6764 

.3800 

.4096 

.5607 

.6690 

.3602 

.3772 

.5559 

2% 

^.5 

.7842 

.5655 

.5620 
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4.  ESTIMATION  OF  A DISCRETE  DISTRIBUTION  FUNCTION 
BY  EXTRAPOLATING  UPPER  AND  LOWER  BOUNDS 

4.1  Introduction 

Since  it  is  sometimes  impractical  to  completely  enumerate  a 
subnetwork’s  discrete  duration  distribution,  F,  the  bounds  F (t;  0,  A) 
and  F (t;  0,  A)  are  calculated  as  a first  step  in  the  determination 
of  an  estimate,  F,  of  F.  Theorems  4 and  5 of  section  2 imply  that 
for  0 very  large  both  F'*'(t;  0,  A)  and  F (t;  0,  A)  may  serve  as 
adequate  estimates  of  F.  Unfortunately,  it  becomes  increasingly 
laborious  to  calculate  these  quantities  as  0 Hence  the  extrapo- 

lation procedure  described  in  this  section  was  devised  as  a practical 
alternative  to  evaluating  the  upper  and  lower  bounds  for  large  0. 

4.2  The  Extrapolation  Problem 

Suppose  that  for  a particular  subnetwork,  the  numerical  values 
of  F'*'(t;  0,  A)  and  F (t;  0,  A)  are  available  for  each  of  the  combina- 
tions of  t = t^,  t^  and  (0,  A)  = (0^,  A^),  ...,  (^jj  ^j)  where 

(1)  t.  < t,,T  for  all  i and 

(2)  0.  < 0.,-  and  A.  < A.,-  for  all  j. 

3 “ 3+1  3-3+1 

The  specific  goal  o£  the  extrapolation  procedure  is  to  estimate  F at 
the  points  t^,  t2j  •••» 

Let 

a)  = 1/(1  + 0) 


0)^  = 1/(1  + 0j)  , 


j — 1 , . . . , J , 


(4.1) 
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and  define 

(a)  0))  = F'^Ct;  9,  A)  and 

(b)  H~(t,  0))  = F"(t;  0,  X)  . (4.2) 

Then  the  results  of  Theorems  4 and  5 can  be  restated  as  follows 

(a)  n'^Ct,  0))  is  a nondecreasing  function  of  o)  for  every 
t;  H (t,  0))  is  a nonincreasing  function  of  o)  for 
every  t ; 

(b)  for  any  a>  and  t 

0))  ^ F(t)  ^ H”(t,  0))  ; 

and 

(c)  there  exists  a finite  value  o)*  such  that  o)  £ o)* 
implies  H'^(t,  a>)  = H (t,  o))  = F(t)  for  every  t. 

Thus,  estimating  H'^(t,  0)  and  H (t,  0)  is  the  same  as  estimating 
F(t).  Although  viable  estimates  of  F(t)  can  be  obtained  in  a variety 
of  ways,  the  proposed  procedure  uses  the  known  quantities  H'^Ct^,  a)^) 
and  H 0)^)  (i  = 1,  ...,  I;  j =1,  •••»  J)  to  estimate  functions 

H (t,  0))  and  H (t,  o))  satisfying 

(1)  n'^Ct^,  ^j)  ^ H"^(t^,  each  i and  j; 

(2)  H (t^,  a)j)  j<  H (t^,  for  each  i and  j;  and 

(3)  0)  = 0)  0)  = 0)  for 


each  i . 


(4.3) 
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and  then  estimates  F(t^)  by 

F(t^)  = H"'’(t^,  0)  = H"(t^,  0)  i = 1,  I . (4.4) 

The  basic  idea  is  simply  for  each  to  fit  a function  H o)) 

(as  a function  of  cd)  to  the  sequence  of  upper  bounds  on  F(t^)  namely 

H (t.,  03^),  H (t.,  0)^)  and  also  fit  a function  H (t  , m)  to  the 

1 1 1 J ^ 

lower  bounds  on  F(t^)  under  the  restriction  that 

^ I ^ ^ 

lim  H (t.,  co)  = lim  H (t.,  co)  . (4.5) 

a)"K)  ci)"K) 

Since  Ji  • • • £ the  additional  restriction  that 

H''’(t^,  0)  £ ...  1 0)  (4,6) 

is  imposed  so  that 

F(t^)  1 ...  1 F(t^)  . (4.7) 

4,3  A Linear  Programming  Solution  to  the  Extrapolation  Problem 
The  determination  of  H and  H is  as  follows • For  each  i let 

H (t^,  m)  = + a^^o)  + (4,8) 

and 

H (t^,  co)  = + ^2i^^  (4-9) 

where  ^li’  ^2i’  ^Oi’  ^li^  ^2i  ^ ^ constants 

determined  so  that  (4.3)  holds.  Since  (4*4)  is  a quadratic  function  in 
0),  requirement  (1)  of  (4.3)  is  met  by  requiring 
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1 0 i = 1,  . . I . (4.10) 

Similarly,  requlrfmont  (2)  of  (4.3)  Ik  met  by  restrLctlng 

3^^  1 0 i = 1,  ...,  I . (4.11) 

Finally,  requirement  (3)  is  met  by  requiring 


“Oi  ^Oi  - “0,i+l  ^0,i+l  ^ ^ ^ ‘ 

Of  course,  when  the  restrictions  (4.10),  (4.11),  and  (4.12)  are 
enforced,  it  is  not  always  possible  to  have 


(4.12) 


(1)  H^(t^,  “ j ) “ H'*'(t^,  cj^)  and 

(2)  H"(t^,  cj^)  = H~(t^,  cj^)  (4.13) 


for  all  i and  i.  Hence,  the  constants  a_.,  a,.,  a„ . , 3,., 

Oi  li  2i  Oi  li 

and  (i  = 1>  •••>!)  ate  determined  by  minimizing 


E a(m  ) [ E ( lH'^(t  , m ) - H'^(t  , cj.)|  + [h  (t  , m.) 
j=l  ^ i=l  J ^ J 

- H“(t^,  m^) 1)]  (4.14) 

under  the  restrictions  (4.10)  - (4.12)  where  the  a(a)^)  is  a specified 
nonnegative  weighting  constant. 

The  weights,  a(aj),  in  (4.14)  should  reflect  the  increase  in 
information  about  F(t)  as  w *>•  0 (i.e.,  9 ->  “)  . In  the  algorithm  the 
weight  a((ji))  has  been  defined  to  be 


a(a3)  = 1 - 2(13^  + 3(1)^ 
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The  coefficients  in  the  cubic  function  were  selected  so 


a(0)  = 1,  a(l)  = 0,  and 


da(a)) 


do) 


da(a)) 


03-0  do) 


03=1 


= 0 


(4.15) 

Hence,  the  points  03  = .25  and  03  = .5  which  correspond  to  0 = 3 and 
9=1,  respectively,  have  weights  .84375  and  .50000,  respectively. 


The  minimization  of  (4.14)  subject  to  (4.10)  - (4.12)  can  be 
restated  as 


minimize  Z a(o3,)  Z (u..  + v , . ) 

j.i  j 1-1 


subject  to 
1 

-u.  . < Z Yi  + .(jo.  + 

ij  - k li  J 2il 


H (t.,  (jo.)  < u . . for  all  i,  j 

i 1 - ij 


-V..  < Z Yi  - 3i  .w.  + (3^.,  - 

±2  — k li  2 211  2i2  2 


H (t.,  (jo.)  < V.  . for  all  1,  j 
^ i’  - ij 


2 Yvll 

(4.16) 

k=l 

^ij’  ^ij’  “i’  ®li’  “li’  “2il’  ^2i2’  ^2il’ 

3212  ^ 0 

for  all  i,  j 

where 

“2i  “2il  “2i2  ’ 

(4.17) 

^2i  " ^2il  " ^212  ’ 

and 

(4.18) 

i 

“oi  ■ ^Oi  ■ ^ • 

(4.19) 

k=l 


This  is  a linear  programming  problem  which  may  be  solved  using  any 


standard  method.  In  the  computer  implementation  of  the  subnetwork 
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analysis  procedure,  a streamlined  version  of  the  revised  simplex 
algorithm  was  especially  prepared  and  implemented  to  solve  this 
problem.  Once  this  linear  programming  problem  has  been  solved,  F is 
retrieved  through  the  relation 

^ i 

F(t  ) = E Y for  all  i . (4.20) 

^ k=l  ^ 

If  the  upper  bounds  F'*'(t,  0,  X)  and  lower  bounds  F (t;  0,  X)  were 
determined  without  sampling  then 

H'*'(t^,  w^)  ^ ^ H'*'(t^,  Wj)  ^ F(t^)  ^ H (t^,  Wj)  ^ ^ H (t^,  w^)  . 

However  this  relationship  does  not  necessarily  hold  if  sampling  is 
used  in  the  determination  of  the  bounds.  Hence  the  determination  of 
H (t,  w)  and  H (t,  w)  does  not  include  the  restriction 

H (t^,  Wj)  ^ F(t^)  ^ H (t^,  Wj)  i = 1,  ■ . . , I . 

It  should  also  be  noted  that,  if  a weighted  least  squares 
criterion  had  been  used  instead  of  minimization  of  a weighted  sum 
of  absolute  residuals,  then  the  determination  of  F(t)  would  have  been 
a quadratic  programming  problem  instead  of  a somewhat  simpler  linear 
programming  problem. 

4.4  An  Example  of  the  Linear  Programming  Solution 

Using  the  simplex  algorithm  referred  to  in  subsection  4.3,  the 
linear  programming  problem  (4.16)  was  solved  for  the  data  in  Table  10. 
Figure  10  indicates  the  fits  obtained. 
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TABLE  10 


Extrapolation  Data 


F+(H+) 

F (H~) 

®1 

= 1.5 

(‘"l  = 

.4) 

.7 

.8 

0.0 

.3 

1.0 

.5 

= 2.5 

((^2  = 

.2851) 

.4 

.6 

.15 

.3 

'3 

.75 

.6 

= 3.5 

((i>3  = 

.2222) 

.35 

.55 

.25 

.45 

^3 

.75 

.65 

As  required,  F(t^)  = .31  ^ FCt^)  = .43  £ FCt^)  = .71. 

Figure  10 

Extrapolation  results  for  the  data  in  Table  10. 


o^ 

o 
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5.  POTENTIAL  MODIFICATIONS  OF  THE 
SUBNETWORK  ANALYSIS  PROCEDURE 


5 . 1 Introduction 


The  objective  of  Subnetwork  Analysis  is  to  determine  each  sub- 
network’s duration  distribution,  say  F(t).  When  this  step  is  begun, 
each  activity  has  a specified  duration  distribution.  Let 
n = number  of  activities  in  the  subnetwork, 

= the  duration  of  activity  i,  and 

F (t)  = the  c.d.f.  for  activity  i. 

^ • 

1 

Also,  let 

m = the  number  of  paths  through  the  subnetwork,  and 
Yj  = the  length  of  the  j-th  path  through  the  subnetwork 
n 

= I 6..X.  (5.1) 

i=i  ^ 

where 

= 1 if  activity  i is  on  the  j-th  path 
= 0 otherwise. 

Let  the  maximum  path  length  be 

Y*  = max  Y. . (5.2) 

l<;i£m  ^ 


Then 


F(t)  = P(Y*  £ t) 


rt 

~oo 


rt 


(5.3) 


where 


„ (t- , ...,  t ) = the  joint  distribution  of  the  m paths. 
Y 1 m 

m 
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The  activity  distributions  are  assumed  to  be  independent.  Thus, 
the  marginal  distribution  of  Y.,  say  (t) , is  the  convolution  of  the 

, ^ j 

path  s activity  duration  distributions;  that  is, 


Fy  (t)  = 
j 


F (t  - E 6 X ) n dF^  (X,) 

’'Jl  ^ ’‘i  ^ 

136 . .=1 
ij 


(5.4) 


where  is  the  index  of  an  activity  on  the  path.  Furthermore,  if 

^Y. |X^^^ 

J ' 

denotes  the  conditional  distribution  of  Y^.  given  a set,  X,  of  activity 
duration  values,  then 


Fy  I X ^ ^ ^ I * * * 


F^  (t  - E 6..X.)  n dF^  (X.) 

1 1 If  J 1 ^ 

136 . . =1 
ij 

X.^X 


(5.5) 


which  is  the  convolution  of  the  path's  activity  durations  not  in  X. 

If  there  is  no  activity  that  is  in  two  or  more  of  Y.  , ...,  Y.  (that 

^1 

is,  these  paths  have  no  activities  in  common),  then  Y.  , ...,  Y.  are 

^1 

Independent , and 


Fy  Y ’ • • • > t . ) — n F„  (t  . ) 


'k  1=1  “1 


(5.6) 


However,  if 

X = {X  I activity  1 is  in  more  than  one  of  Y.  , ...,  Y } 
1 1 m 


is  a nonempty  set,  then  Y^,  . . . , Y^  are  dependent,  and 


F(t)  = 


r r ^ 

[ n F I (t  - E 6.  X )]  n dF„  (X.). 
j=i  x^ex  ^ x^ex  \ ^ 


(5.7) 
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5.2  Explicit  Evaluation  of  the  Subnetwork 
Duration  Distribution 

For  simple  subnetworks  it  is  relatively  easy  to  identify  all  of 
the  paths  and  give  an  explicit  expression  for  F(t)  via  (5.7).  In 
particular  Hartley  and  Wortham  (1966)  considered  series,  parallel, 
and  Wheatstone  Bridge  subnetworks  (see  Figure  11) . Ringer  (1969) 
extended  this  work  to  include  Double  Wheatstone  Bridge  and  Criss-Cross 
subnetworks  (see  Figure  12) . These  exact  expressions  for  the  subnet- 
work duration  distribution  form  the  basis  of  Step  2,  Simplification, 
in  the  project  scheduling  procedure.  Interestingly,  this  implies 
that  the  subnetworks  actually  considered  in  Step  4,  Subnetwork 
Analysis,  do  not  have  any  of  these  simple  activity  configurations  in 
them,  and  hence  are  generally  fairly  complex. 

To  utilize  (5.7)  to  determine  F(t),  all  the  paths  through  the 
subnetwork  must  be  identified  and  then  the  numerical  evaluation  of 
(5.7)  performed.  Martin  (1964)  presented  a clever  method  for  per- 
forming the  numerical  evaluation  of  (5.7)  when  the  activity  duration 
distributions  were  all  piecewise  polynomial  functions  with  finite 
ranges.  Martin’s  technique  is  most  readily  suited  to  subnetworks 
primarily  composed  of  activities  in  series  or  parallel.  Unfortunately, 
the  subnetworks  generally  encountered  in  the  Subnetwork  Analysis  step  ^ 
are  not  of  this  form.  Furthermore,  Martin's  technique  becomes  com- 
putationally impractical  for  large  subnetworks. 
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1 


Two  Activities  in  Series 


Figure  11 

Subnetworks  considered  by  Hartley  and  Wortham  (1966) • 
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Double  Wheatstone  Bridge 


Criss-Cross 


Figure  12 


Subnetworks  considered  by  Ringer  (1969). 
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5.3  Approximating  the  Subnetwork  Duration 
Distribution  F(t) 

Since  the  explicit  evaluation  of  the  exact  expression  for  the 
subnetwork  duration  distribution  F(t)  given  in  (5.7)  is  generally 
impractical  for  other  than  simple  subnetworks,  several  authors  have 
considered  approximating  F(t) . A review  of  the  classical  approximation 
procedures  is  given  in  Moder  and  Phillips  (1974) . The  more  recent 
approximation  procedures  are  essentially  based  on  either  sophisticated 
Monte  Carlo  simulation  or  the  determination  of  upper  and  lower  bounds 
for  F(t).  The  Subnetwork  Analysis  procedure  developed  in  Sections 
2 - 4 is  one  of  these  approximation  procedures.  That  Subnetwork 
Analysis  procedure  basically  estimates  F(t)  by  extrapolating  a 
sequence  of  upper  and  lower  bounds  on  the  subnetwork’s  discrete 
duration  distribution  F - with  specialized  Monte  Carlo  techniques 
sometimes  employed  in  the  determination  of  the  upper  and  lower  bounds. 

Noteworthy  papers  on  the  Monte  Carlo  simulation  of  F(t)  include 
Van  Slyke  (1963),  Gaver  and  Burt  (1968),  and  Burt  and  Garman  (1971). 

The  two  outstanding  published  techniques  for  determining  upper 
and  lower  bounds  on  F(t)  are  due  to  Robillard  and  Trahan  (1977)  and 
Kleindorfer  (1971) . These  techniques  are  briefly  discussed  in 
subsections  5.3.1  and  5.3.2,  respectively. 

Subsection  5.3.3  indicates  several  ways  that  the  Monte  Carlo 
techniques  and  the  upper  and  lower  bounds  of  Kleindorfer  (1971)  and 
Robillard  and  Trahan  (1977)  can  be  incorporated  into  the  general 
Subnetwork  Analysis  procedure. 
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5.3.1  Robillard  and  Trahan’s  Lower  Bound  on  F(t) 


Robillard  and  Trahan  (1977)  proposed  a lower  bound,  F (t),  for 
F(t)  based  on  a Bonferroni  inequality.  Specifically, 


m m 

P(  max  Y.  > t)  = P(  U {Y.  > t})  E P(Y.  > t) , 
l<j<m  ^ j=l  ^ j=l  ^ 


(5.8) 


SO 


m 


F(t)  = P(  max  Y.  <_  t)  = 1 - P(  max  Y.  > t)  ^ 1 - E P(Y.  > t) 
l<j<m  ^ l<j<m  ^ j=l  ^ 


(5.9) 


m 


m 


= 1 - E [1  - Fy  (t)]  = 1 - m + E Fy  (t)  = F (t). 

j=l  j j=l  j 


m 


Robillard  and  Trahan  (1977)  evaluate  the  term  ^ f (t)  using  the 

j=l  j 

characteristic  functions,  say  ^ (t)  , (x),  of  Y , ...,  Y . 


m 


m 


Let  denote  the  integration  corresponding  to  the  inversion  of  a 
characteristic  function.  Then 


m 


m 


m 


E F^_(t)  = 

3=1  3 3=1  3 3=1  3 


(5.10) 


where  the  last  equality  follows  from  the  linearity  of  integration. 

For  example,  if  Y. , ...,  Y are  all  continuous  random  variables,  then 
1 m 


m 


m rt 


2 (^)]  = ^ { 


j=l 


T =1  •'  _OOJ 


rt 


m 

^ e E (T)dTdx 

j=l  j 


(5.11) 


m 


= I^[  ^ \ (^)]. 

j=l  j 
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Although  (t)  can  be  written  as  the  product  of  the  character- 

j 

istic  functions  for  the  individual  activities  in  Y.,  this  approach 
m ^ 

to  evaluating  Z \p  (t)  would  require  the  explicit  enumeration  of  all 

j=l  j 

the  subnetwork’s  paths  which  is  computationally  impractical  for  large 

complex  subnetworks.  Therefore  Robillard  and  Trahan  (1977)  developed 

m 

the  following  recursive  scheme  for  evaluating  Z \p  (t).  Let 

j=l  j 

denote  the  characteristic  functions  of  the  indi- 

1 n 

vidual  activity  durations  X^,  X^.  Let  the  k-th  activity  originate 

at  node  Orig^  and  terminate  at  node  Term^.  If 


B . = {k|  Term,  = i}  , 

(j)(T,l)  = 1,  and 

(pCiyi)  = Z ())(T,Orig  )tj;  (t),  i = 2, 

keB . \ 

1 

where  N is  the  number  of  nodes,  then 

V 

m 

E (t)  = ())(t,N)  . 

j=l  j 


N, 


(5.12) 

(5.13) 

(5.14) 


(5.15) 


Although  it  is  not  explicitly  noted  by  Robillard  and  Trahan  (1977), 
the  number  of  paths,  m,  can  also  be  recursively  generated.  If 


and 


m^ 


= 1 


m,=  Z HL  i=2,...,N, 

^ keB. 


then 


(5.16) 

(5.17) 


m = m^.  (5.18) 

Apart  from  any  numerical  inaccuracies  in  the  computation  (5.14), 
the  tightness  of  the  lower  bound  F (t)  is  the  same  as  the  tightness 
of  the  Bonferroni  inequality  (5.8). 
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Robillard  and  Trahan  (1977)  also  note  that  another  Bonferroni 
inequality  implies  that 

m m 

P(  max  Y.  > t)  = P(  U {Y.  > t})  > E P(Y.  > t)  - E P(Y.  > t,  Y.  > t) 

j=l  J “J=l  J i<j  " ^ 

(5.19) 

Unfortunately,  to  use  (5-19)  as  the  basis  for  an  upper  bound  on  F(t) 
seems  to  require  the  explicit  enumeration  of  the  paths  because  of  the 
joint  nature  of  P(Y^  > t,  Y^  > t)  and  the  lack  of  a convenient  upper 
bound  for  P(Y^  > t,  Y^  > t)- 

5-3-2  Kleindorf er ’ s Upper  and  Lower  Bounds  on  F(t) 

Let  the  subnetwork's  activities  be  numbered  i = 1,  . . . , n in 
such  a way  that,  if  i < j and  both  activities  i and  j are  on  a path, 
then  activity  i precedes  activity  j • Let  denote  the  set  of 
activities  which  immediately  precede  activity  i on  some  path-  As 
before,  let 

= the  duration  of  activity  i 
and  also  define 

U^  = the  earliest  time  at  which  activity  i can  commence, 

P^(t)  = P(U^  1 t)  (5.20) 

= U^  + = the  completion  time  for  activity  i,  and 

Q.  = P(V.  < t). 

1 1 — 

Kleindorfer  (1971)  proposed  upper  and  lower  bounds  for  F(t)  by 
recursively  defining  upper  bounds,  and  Q^(t)  , and  lower  bounds, 

P”(t)  and  QV(t)  on  P^(t)  and  Q^(t) , respectively.  The  upper  bounds 
P^(t)  are  based  upon  the  inequality 
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min  P(V.  <_  t)  ^ P(max  V.  £ t)  = P.(t),  (5.21) 

jeA.  ^ jeA.  ^ ^ 

and  Ql(t)  is  simply  the  convolution  of  F (t)  and  Pl(t).  The  lower 
bounds  PV(t)  are  based  upon  the  inequality 

P.(t)  = P(max  V.  £ t)  ^ n P(V.  £ t),  (5.22) 

^ jeA^  ^ jeA^  ^ 

and  QV(t)  is  the  convolution  of  F (t)  and  PV(t) . (Although  Kleindorfer 
1 X , i 

1 

proves  a version  of  (5.22),  the  inequality  as  stated  follows  from  the 
more  general  results  of  Esary,  Proschan,  and  Walkup  (1967).) 

The  recursive  relations  for  P Ut) , Qj^(t)  , P(^(t),  and  Q!/(t)  are 
as  follows:  For  notational  convenience  assume  that  activity  1 is  an 
activity  with  zero  duration  which  precedes  the  rest  of  the  subnetwork 
and  that  activity  n is  an  activity  with  zero  duration  which  follows 
the  completion  of  the  rest  of  the  subnetwork.  Furthermore,  assume  that 
is  a discrete,  nonnegative  random  variable  taking  on  values  in  S 


for  all  i.  Then,  for  t ^ 0, 

P{(t)  = Pj^(t)  = Q^(t)  = Q'(t)  = 1,  (5.23) 

P (t)  = min  Q:(t),  i = 2,  ...,  n,  (5.24) 

jeA.  J 

Q'(t)  = Z P(X  = s)P:(t  - s),  i = 2,  ...,  n,  (5.25) 

s6S 

P^'(t)  = P^(t)  = Q^(t)  = Q"(t)  = 1,  (5.26) 

P"(t)  = n Q'.'(t),  i = 2,  ...,  n,  and  (5.27) 

j6A.  J 

Q"(t)  = Z P(X.  = s)PV(t  - s),  i = 2,  ...,  n.  (5.28) 

seS  ^ 
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Finally, 


P"(t)  < F(t)  < P'(t). 
n — — n 


(5.29) 


The  computational  beauty  of  these  bounds  on  F(t)  is  that  they  do 
not  necessitate  the  enumeration  of  all  of  the  subnetwork’s  paths - 

The  tightness  of  these  bounds  on  F(t)  depends  on  the  structure  of 
the  subnetwork.  Since  the  recursive  relations  (5.23)  - (5.28) 


sequentially  bound  the  P^(t)  in  terms  of  the  bounds  for  the  completion 


time  distributions  of  the  activities  immediately  preceding  activity  i, 
the  differences  P^(t)  - P^(t)  and  P^(t)  - PV(t)  essentially  cumulate 
as  i increases.  Therefore,  the  bounds  on  F(t)  will  generally  tend 
to  be  tighter  the  shorter  the  subnetwork’s  paths.  Furthermore,  the 
difference 


(5.30) 


tends  to  decrease  as  the  V^’s  have  more  and  more  activities  in  common; 
whereas,  the  difference 


(5.31) 


tends  to  increase  as  the  V.’s  have  more  and  more  activities  in  common. 


Thus  subnetwork  structures  that  lead  to  tight  upper  bounds  on  F(t), 
lead  to  loose  lower  bounds  on  F(t),  and  vice  versa.  Of  course,  the 
tightness  of  both  the  upper  and  lower  bounds  tends  to  decrease  as 


the  number  of  paths  increases. 
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5.3.3  Incorporating  Different  Methods  of  Approximating  F(t)  into 
Subnetwork  Analysis 

The  Monte  Carlo  simulation  techniques  and  the  bounding  procedures 
of  Kleindorfer  (1971)  and  Robillard  and  Trahan  (1977)  referred  to  thus 
far  in  subsection  5.3  could  be  used  to  modify  the  current  Subnetwork 
Analysis  procedure  discussed  in  Sections  2-4.  Since  the  empirical 
experience  with  the  modifications  to  be  briefly  described  in  the 
remainder  of  this  subsection  is  generally  extremely  limited,  these 
potential  modifications  are  really  subjects  for  future  research. 

A Monte  Carlo  simulation  of  the  subnetwork  duration  distribution 
F(t)  could,  of  course,  essentially  replace  the  current  Subnetwork 
Analysis  procedure.  A less  radical  revision  would  be  to  carry  out 
the  cluster  formation  procedure  described  in  subsection  2.2  for  a 
fixed  (presumably  large)  value  of  (e,A);  let  IMPORTANT  be  the  set  of 
all  activities  in  the  union  of  the  clusters;  and  then  estimate  F(t) 
by  fixing  the  durations  of  the  activities  not  in  IMPORTANT  at  their 
mean  values  and  doing  a Monte  Carlo  simulation  of  the  durations  for 
the  activities  in  IMPORTANT.  The  durations  of  the  activities  in 
IMPORTANT  could  be  simulated  from  either  their  actual  distributions 
or  their  approximate  two-point  discrete  distributions . Another 
potential  modification  would  be  to  perform  the  current  Subnetwork 
Analysis  procedure  as  is  except  that  the  upper  and  lower  bounds 
F (C;t)  and  F (C;t)  used  in  determining  F (t;0,X)  and  F**'(t;0,A) 
could  be  determined  with  the  durations  for  activities  in  C determined 
by  a Monte  Carlo  simulation  of  their  actual  duration  distributions  or 
to  their  two-point  discrete  distributions. 
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Another  possible  replacement  for  the  current  Subnetwork  Analysis 
procedure  would  be  to  determine  Kleindorfer ’s  upper  bound  on  F(t) 
and  either  Kleindorf er ’ s or  Robillard  and  Trahan’s  lower  bound  on  F(t) 
and  then  use  the  average  of  these  two  bounding  distributions  as  the 
estimate  of  F(t).  (Of  course,  the  maximum  of  Kleindorfer ’s  and 
Robillard  and  Trahan’s  lower  bounds  is  also  a valid  lower  bound.) 

Again  a less  radical  revision  would  be  to  carry  out  the  cluster 
formation  procedure  described  in  subsection  2.2  for  a fixed  (presumably 
large)  value  of  (e,A);  let  IMPORTANT  be  the  set  of  all  activities  in 
the  union  of  the  clusters;  and  then  estimate  F(t)  by  fixing  the  dura- 
tions of  the  activities  not  in  IMPORTANT  at  their  mean  values  and 
averaging  the  upper  and  lower  bounds  for  the  subnetwork  duration 
distribution  when  the  durations  for  the  activities  in  IMPORTANT  have 
either  their  actual  distributions  or  their  two-point  discrete 
distributions.  The  durations  for  the  activities  not  in  IMPORTANT  could, 
alternatively,  be  fixed  at  their  lower  values  when  the  upper  bound 
is  being  determined  and  be  fixed  at  their  upper  values  when  the  lower 
bound  is  being  determined.  Finally,  another  potential  modification 
would  be  to  perform  the  current  Subnetwork  Analysis  procedure  as  is 
except  that  the  upper  and  lower  bounds  F (C;t)  and  F (C;t)  could  be 
either  Kleindorfer ’ s or  Robillard  and  Trahan’s  bounds  determined  with 
the  durations  for  the  activities  in  C having  either  their  actual 
distributions  or  their  two-point  discrete  distributions. 

Presumably,  a project  scheduler  might  settle  for  a project 
schedule  which  has  the  probability  of  the  project’s  completion  by 
the  specified  deadline  bounded  from  below  by  a specified  amount. 
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In  such  instances  lower  bounds  on  the  subnetwork  duration  distributions 
suffice.  Then  the  Subnetwork  Analysis  procedure  could  be  replaced 
by  a procedure  which  simply  determines  either  Kleindorf er ' s or 
Robillard  and  Trahan's  lower  bound.  Alternatively,  the  cluster  forma- 
tion procedure  could  be  carried  out  for  a specified  value  of  (0,A), 
the  set  IMPORTANT  of  all  activities  in  the  union  of  the  clusters 
formed,  and  then  either  Kleindorf er ’ s or  Robillard  and  Trahan’s 
lower  bound  computed  with  the  durations  for  the  activities  outside 
IMPORTANT  fixed  and  the  durations  for  the  activities  in  IMPORTANT 
having  either  their  actual  distributions  or  their  two-point  discrete 
distributions . 


5.4  Additional  Probability  Inequalities  as  Bases  for 
Upper  and  Lower  Bounds  on  F(t) 


In  addition  to  the  ones  cited  in  subsections  5.3.1  and  5.3.2, 

there  are  other  known  probability  inequalities  which  imply  upper  and 

lower  bounds  on  F(t)  = P(  max  Y.  ^ t) . Three  upper  bounds  on 

l_<j^  ^ 

P(  max  Y.  t)  and  the  authors  who  proposed  them  are: 

(i)  Chung  and  Erdos  (1952), 

m m 

P(  max  Y.  £ t)  1 - {[  Z P(Y.  > t)]^/[  Z P(Y.  > t) 
l<j<m  ^ j=l  ^ j=l  ^ 


(5.32) 


+ Z P(Y  > t,  Y > t)]}; 

(ii)  Dawson  and  Sankoff  (1967) , 

2 ^ 1 

p(  max  Y.  ±t)  £ 1 [ Z P(Y.  > t) ^ Z P (Y . > t , Y . > t)  ] 

l£j<m  J j=l  ^ i<j  ""  J 


(5.33) 
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where  r is  the  greatest  integer  less  than  or  equal  to 

m 

E P(Y,  > t,  Y,  > t)/  E P(Y.  > t);  (5.34) 

Wj  ^ ^ 3=1  ^ 

and  (ill)  Kounlas  (1968) , 

P(  max  Y.  1 t)  1 - { E P(Y.  > t)  - E P(Y.  > t,  Y.  > t)  (5.35) 
l<j<m  j6L  ^ i<j  ^ ^ 

i,j6L 

where  L is  any  subset  of  {1,  2 m}  with  two  or  more  elements. 

A lower  bound,  proposed  by  Hunter  (1976),  is 

m 

P(  max  Y.  £ t)  ^ 1 - E P(Y.  > t)  - E P(Y.  > t,  Y.  > t) 

3-1  ^ (i,d)eT  ^ ^ (3  3y 

where  T is  any  connected  set  of  m - 1 pairs  (i,j)  such  that  either 
(.,k)  or  (k,.)  is  in  the  set  for  each  k = 1,  m. 

The  primary  difficulty  in  evaluating  these  bounds  is  that  the 
subnetwork’s  paths  must  be  explicitly  enumerated  in  order  to  compute 
the  > t,  Yj  > t).  Should  this  computational  difficulty  be 

overcome,  however,  the  bounds  could  be  incorporated  into  the 
Subnetwork  Analysis  procedure  as  per  the  discussion  in  subsection 


5.3.3. 
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6,  CONCLUDING  REMARKS 

This  report  liad  as  its  goal  the  improvement  and  implementa- 
tion of  a new  project  scheduling  procedure  currently  being  developed 
at  the  Institute  of  Statistics,  Texas  A&M  University.  The  project 
scheduling  procedure  has  been  improved  by  significantly  extending 
the  very  critical  Subnetwork  Analysis  procedure.  In  particular,  a 
suitable  sampling  procedure  and  estimator  for  bounds  on  the  subnet- 
work’s duration  distribution,  F(t),  has  been  developed  and  incorporated. 

In  addition,  a procedure  for  extrapolating  upper  and  lower  bounds  on 
F(t)  to  obtain  an  estimate  of  F(t)  has  also  been  determined  and 
implemented. 

A computer  system  implementing  the  project  scheduling  procedure 
(including  the  improvements  in  Subnetwork  Analysis)  has  been  prepared 
and  is  documented  in  Baker  and  Sielken  (1978) . 

In  addition,  some  possible  alternatives  to  the  current  Subnetwork 
Analysis  procedure  have  been  suggested.  These  alternatives  are 
interesting  topics  for  future  research. 
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